Sage Journals: Discover world-class research

Abstract

There is a growing interest in dimensionality reduction techniques for face recognition, however, the traditional dimensionality reduction algorithms often transform the input face image data into vectors before embedding. Such vectorization often ignores the underlying data structure and leads to higher computational complexity. To effectively cope with these problems, a novel dimensionality reduction algorithm termed distance adaptive tensor discriminative geometry preserving projection (DATDGPP) is proposed in this paper. The key idea of DATDGPP is as follows: first, the face image data are directly encoded in high-order tensor structure so that the relationships among the face image data can be preserved; second, the data-adaptive tensor distance is adopted to model the correlation among different coordinates of tensor data; third, the transformation matrix which can preserve discrimination and local geometry information is obtained by an iteration algorithm. Experimental results on three face databases show that the proposed algorithm outperforms other representative dimensionality reduction algorithms.

Keywords

face recognition manifold learning tensor structure discriminative geometry preserving projection dimensionality reduction

1. Introduction

Over the last decade face recognition has become one of the most active research areas in multimedia information processing due to the rapidly increasing requirements in many practical applications, such as identify authentication, information security, human-computer interaction/communication and so on. A major challenge of face recognition, however, is that the captured face image data often lies in a high-dimensional feature space. For example, a face image with the resolution of $128 \times 128$ pixels is represented as a point in a 16384-dimensional face space. Learning in such high-dimensionality in many cases is almost infeasible. Thus, learnability necessitates dimensionality reduction, which aims to find the lower-dimensional feature representation of high-dimensional data with enhanced discriminatory power. Once the high-dimensional face image data is projected into lower-dimensional feature space, traditional classification schemes can then be applied.

The most popular conventional algorithms for dimensionality reduction are principal component analysis (PCA) and linear discriminant analysis (LDA) [1]. PCA maintains the global Euclidean structure of the data in the original high-dimensional space and projects the data points into a lower-dimensional subspace, in which the sample variance is maximized. In contrast to the unsupervised method of PCA, LDA is a supervised learning approach. LDA aims to seek the project axes on which the data points of different classes are far from each other while requiring data points of the same class to be close to each other. The optimal projection of LDA is computed by minimizing the within-class scatter and maximizing the between-class scatter simultaneously. It is generally believed that LDA-based algorithms outperform PCA-based ones, since the former optimizes the low-dimensional representation of the objects with the most discriminant information, while the latter simply achieves object reconstruction. In addition, non-negative matrix factorization (NMF) [2] has been proposed for face recognition. NMF learns the parts of objects by using non-negativity constraints, which leads to a parts-based representation of objects. However, the iterative update method for solving the NMF problem is computationally expensive. Therefore, although the above mentioned algorithms have widely been applied to face recognition, they are designed for discovering only the global Euclidean structure, whereas the local manifold structure is ignored.

Recently, a number of manifold learning algorithms have been proposed to discover the geometric property of high-dimensional data spaces and they have been successfully applied to face recognition. Manifold learning aims at discovering the geometric properties of the data space, such as its Euclidean embedding, intrinsic dimensionality, connected components, homology, etc. The desired manifold is an intrinsically lower-dimensional space hidden in the input high-dimensional space. The most well-known manifold learning algorithm include isometric feature mapping (ISOMAP) [3], locally linear embedding (LLE) [4] and Laplacian eigenmap (LE) [5]. ISOMAP, a variant of multidimensional scaling (MDS), aims to preserve the global geodesic distances between any pair of data points. LLE aims to embed data points in a low-dimensional space by finding the optimal linear reconstruction in a small neighbourhood. LE aims to preserve proximity relationships by manipulations on an undirected weighted graph, which indicates neighbour relations of pairwise data points. These nonlinear methods have achieved impressive results on some pattern classification tasks, however, the mappings derived from them are defined only on the training data points and thus, how to evaluate the maps on novel test data points remains unclear [6]. One common response to cope with this problem is to apply a linearization procedure to construct explicit maps over new samples. The representative example of this approach is locality preserving projection (LPP) [7], a linearization of LE. LPP is obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. Despite this, dimensionality reduction methods are based on different embedding criteria. Yan et al. demonstrated that most of them can be mathematically unified within a general framework, called graph embedding [8]. Based on the graph embedding framework, marginal Fisher analysis (MFA) was developed to jointly consider the local manifold structure and the class label information for dimensionality reduction. However, MFA extracts discriminative information from only marginal samples, although non-marginal samples also contain the discriminative information. Recently, Song et al. has proposed discriminative geometry preserving projections (DGPP) [9] which has yielded impressive results on scene classification. DGPP is fundamentally based on manifold learning, but simultaneously considering both intraclass geometry and interclass discrimination for dimensionality reduction.

However, most previous dimensionality reduction algorithms often operate on input face image data after they have been transformed into a 1-D vector. In fact, face image data are intrinsically in the form of second- or higher-order tensors [10,11]. For example, grey-level face images represent second-order tensor data (matrices) and can be expanded to third-order tensors by representing sets of images after Gabor filtering. Therefore, such vectorization ignores the underlying data structure and often leads to the curse of the dimensionality dilemma and the small sample size problem. To address these problems, many tensor-based dimensionality reduction algorithms have been proposed. Representative algorithms include tensor principal component analysis (TPCA) [12], tensor linear discriminant analysis (TLDA) [10], tensor locality preserving projection (TLPP) [13], tensor marginal Fisher analysis (TMFA) [8] and so on. However, most of the existing tensor-based algorithms simply use the traditional Euclidean distance to measure relationships among different data points. Despite its prevalent usages, Euclidean distance often ignores the relationships among different coordinates for high-order data, such as the spatial relations of pixels in images, which has been shown in many previous studies to be very useful for improving the learning performance [14]. Therefore, it is very natural that both embedding strategies and their related distance metric should be considered in designing tensor-based dimensionality reduction algorithms. However, most of the existing tensor learning algorithms fail to take into account the correlation among different coordinates of data with arbitrary number of orders.

In this paper, we propose a novel distance adaptive tensor manifold learning algorithm for face recognition. By using a data-adaptive tensor distance metric proposed by Liu et al. [14], we can effectively exploit the spatial correlation relationships of face image data to enhance the learning performance. We discuss how to tensorize the discriminative geometry preserving projection which gives rise to distance adaptive tensor dimensionality reduction algorithm for face recognition.

The rest of the paper is organized as follows: in Section 2, we provide a brief review of the original vector-based discriminative geometry preserving projection (DGPP) algorithm. Our distance adaptive tensor manifold learning algorithm for face recognition is introduced in Section 3. The experimental results on face recognition are presented in Section 4. Finally, we provide the concluding remarks and suggestions for future work in Section 5.

2. Brief review of DGPP

The original vector-based discriminative geometry preserving projection (DGPP) [9] is a recently proposed manifold learning algorithm for dimensionality reduction, it can precisely model both the intraclass geometry and interclass discrimination by using an average weighted adjacency graph and local linear reconstruction error. In addition, the original vector-based DGPP avoids the out of sample problem which exists in traditional manifold learning algorithms by applying a linearization procedure to construct explicit maps over new samples.

Given a set of face images ${x_{1}, x_{2}, \dots, x_{n}} \subset ℝ^{p}$ , let $X = [x_{1}, x_{2}, \dots, x_{n}]$ . DGPP aims to find a linear transformation $U \in ℝ^{p \times q}$ that maps each face image $x_{i} (i = 1, \dots, n)$ in the p-dimensional space to a vector $y_{i}$ in the lower q-dimensional space by $y_{i} = U^{T} x_{i}$ such that $y_{i}$ represents $x_{i}$ well in terms of the discrimination preservation and local geometry preservation criterion. The optimal linear transformation of DGPP can be obtained by solving the following maximization problem:

\begin{array}{l} U_{o p t} = \underset{U}{\arg \max} \sum_{i, j}^{n} h_{i j} {‖ U^{T} x_{i} - U^{T} x_{j} ‖}^{2} - {\sum_{i = 1}^{n} ‖ U^{T} x_{i} - \sum_{j : c_{j} = c_{i}} w_{i j} U^{T} x_{j} ‖}^{2} \\ = \underset{U}{\arg \max} Tr (U^{T} X (\begin{array}{l} (D - H^{T}) {(D - H^{T})}^{T} \\ - (I - W^{T}) {(I - W^{T})}^{T} \end{array}) X^{T} U) \end{array}

(1)

with the constraint

U^{T} U = I

(2)

where I is an identity matrix, $D \in ℝ^{n \times n}$ is a diagonal matrix and its ith entry is $\sum_{j = 1}^{n} h_{i j}$ , and the weighing factor matrix $H = [h_{i j}] \in ℝ^{n \times n}$ denotes both the distance weighing information and the class label information in terms of

h_{i j} = {\begin{cases} \exp (- {‖ x_{i} - x_{j} ‖}^{2} / δ^{2}) (1 / n - {1 / n}_{l}), if c_{i} = c_{j} = l \\ 1 / n,, if c_{i} = c_{j} \end{cases}

(3)

where δ is a positive constant and its setting can be referred to in [5], $c_{i} \in {1, 2, \dots, c}$ denotes the class label of face image $x_{i}$ and the $l th$ class contains $n_{l}$ samples satisfying $\sum_{l = 1}^{c} n_{l} = n$ .

Meanwhile, $W = [w_{i j}] \in ℝ^{n \times n}$ is the local linear reconstruction coefficient matrix, wherein $w_{i j}$ denotes the reconstruction coefficient of $x_{i}$ which can be linearly reconstructed from the samples $x_{j}$ with the same class label $c_{i} = c_{j}$ . By imposing $\sum_{j : c_{j} = c_{i}} w_{i j} = 1$ and $w_{i j} = 0$ for $c_{i} \neq c_{j}$ , $w_{i j}$ can be obtained by solving the following reconstruction error minimization problem:

\begin{array}{l} w_{i j} = \arg \min_{w_{i j}} \sum_{i = 1}^{n} {‖ ε_{i} ‖}^{2} \\ = \arg \min_{w_{i j}} \sum_{i = 1}^{n} {‖ x_{i} - \sum_{j : c_{i} = c_{j}} w_{i j} x_{j} ‖}^{2} \end{array}

(4)

Thus, we can easily obtain $w_{i} = \sum_{p} C_{i, p}^{- 1} / \sum_{p, q} C_{p, q}^{- 1}$ , wherein $C_{p, q} = {(x_{i} - x_{p})}^{T} (x_{i} - x_{q})$ is the local Gram matrix and $c_{p} = c_{q} = c_{i}$ .

As can be seen from the above statement, DGPP aims to look for a linear transformation matrix U such that the distances between interclass samples are as large as possible, while distances between intraclass samples are as small as possible, and the local geometry of intraclass samples is preserved as much as possible by keeping linear reconstruction error minimization. Finally, with simple matrix operations, the transformation matrix U that maximizes (1) is given by the maximum eigenvalue solution to the following stand eigenproblem:

X ((D - H^{T}) {(D - H^{T})}^{T} - (I - W^{T}) {(I - W^{T})}^{T}) X^{T} U = λ U

(5)

Thus, the matrix U can also be regarded as the eigenvectors of the matrix $X ((D - H^{T}) {(D - H^{T})}^{T} - (I - W^{T}) {(I - W^{T})}^{T}) X^{T}$ associated with the largest eigenvalues.

Although the original vector-based DGPP has shown promising results on scene classification, it commonly unfolds each image into a single column vector before dimensionality reduction. In fact, image objects are intrinsically in the form of second or higher-order tensors. Thus, the vectorization operation of DGPP largely increases the computational costs of image data analysis and seriously destroys the intrinsic structure relationships among different coordinates of high-order image data which has been shown in many previous studies to be very useful for improving the learning performance [14,15].

3. Distance adaptive tensor discriminative geometry preserving projection for face recognition

In order to preserve the intrinsic relationships among different coordinates for high-order face image data, a natural way is to perform dimensionality reduction in their original high-order tensor space. In this section, we will describe our distance adaptive tensor discriminative geometry preserving projection approach which is fundamentally based on discriminative geometry preserving projection, as well as tensor structure representation and its closely related adaptive tensor distance metric. We begin with a review of a few tensor operations [10,16].

3.1. Review of tensor operations

Assume that a data sample is represented as an nth-order tensor $X \in ℝ^{m_{1} \times m_{2} \times \dots m_{n}}$ , which is addressed by n indices $m_{i}$ , $i = 1, \dots, n$ , with each $m_{i}$ addressing the i-mode of X. The inner product of two tensors $X \in ℝ^{m_{1} \times m_{2} \times \dots m_{n}}$ and $Y \in ℝ^{m_{1} \times m_{2} \times \dots m_{n}}$ is defined as $〈 X, Y 〉 = \sum_{i_{1} = 1, \dots, i_{n} = 1}^{m_{1}, \dots, m_{n}} X_{i_{1}, \dots, i_{n}} Y_{i_{1}, \dots, i_{n}}$ ; consequently, the norm of a tensor X is defined to be $| X | = \sqrt{〈 X, Y 〉}$ and the tensor distance between tensors X and Y is defined as $D (X, Y) = | X - Y |$ . The k-mode product of a tensor X and a matrix $U \in ℝ^{m_{k} \times {m^{'}}_{k}}$ is defined as $Y = X \times_{k} U$ , where

Y_{i_{1}, \dots, i_{k - 1}, j, i_{k + 1}, \dots, i_{n}} = \sum_{i = 1}^{m_{k}} X_{i_{1}, \dots, i_{k - 1}, i, i_{k + 1}, \dots, i_{n}} \times U_{i, j}, j = 1, \dots, {m^{'}}_{k}

(6)

The k-mode unfolding of a tensor $X \in ℝ^{m_{1} \times m_{2} \times \dots m_{n}}$ into a matrix $X^{k} \in ℝ^{m_{k} \times \prod_{i \neq k} m_{i}}$ , that is, $X^{k} \Leftarrow_{k} X$ is defined as $X_{i_{k}, j}^{k} = X_{i_{1}, \dots, i_{n}}$ , $j = 1 + \sum_{l = 1, l \neq k}^{n} (i_{l} - 1) \prod_{o = l + 1, o \neq k}^{n} m_{o}$ . For example, if $X \in ℝ^{4 \times 5 \times 6}$ is a third-order tensor, then we can obtain $X^{1} \in ℝ^{4 \times 30}$ , $X^{2} \in ℝ^{5 \times 24}$ , and $X^{3} \in ℝ^{6 \times 20}$ .

3.2. Data-adaptive tensor distance

Effective distance function plays a key role in tensor-based dimensionality reduction techniques and a number of research efforts have shown that the performance of the tensor-based dimensionality reduction algorithms not only depends on embedding strategies, but is also closely related to distance metrics. The most commonly used distance metrics are Euclidean distance in tensor-based techniques. However, the orthogonality assumption of Euclidean distance ignores the relationships among different coordinates for high-order tensor data, such as the spatial relationships of pixels of pixels in images. To alleviate the orthogonality assumption and enhance the performance of the tensor-based dimensionality reduction algorithms, we adopt the data-adaptive tensor distance metric proposed by Liu et al. [14].

Given a high-order tensor data $X \in ℝ^{m_{1} \times m_{2} \times \dots m_{N}}$ , let x denote the vector form representation of X. Thus, the element $X_{i_{1} i_{2} \dots i_{N}}$ $(1 \leq i_{j} \leq m_{j}, 1 \leq j \leq N)$ in X is corresponding to the lth element in x, i.e., $x_{l}$ , wherein $l = i_{1} + \sum_{j = 2}^{N} (i_{j} - 1) \prod_{o = 1}^{j - 1} m_{o}$ $(2 \leq j \leq N)$ . Then, following the suggestions in [14], the tensor distance between two tensors X and Y can be defined as follows:

\begin{array}{l} d_{t d} (X, Y) = \sqrt{\sum_{l, m = 1}^{m_{1} \times m_{2} \times \dots \times m_{N}} g_{l m} (x_{l} - y_{l}) (x_{m} - y_{m})} \\ = \sqrt{{(x - y)}^{T} G (x - y)} \end{array}

(7)

where $g_{l m}$ denotes the metric efficient and G denotes the metric matrix. The key issue now is the choice of G so that the tensor distance induced by the data-dependent metric matrix is motivated with respect to the intrinsic relationships between different coordinates for high-order tensor data.

In order to model the intrinsic relationships between different coordinates, Wang et al. [15] have already proved the following conclusion: for image data, if the metric coefficients depend properly on distances of pixel locations, then the obtained distance metric can effectively reflect the spatial relationships between pixels. Following this idea, Liu et al. [14] designed the following the metric matrix G:

g_{l m} = \frac{1}{2 π σ^{2}} \exp (- \frac{{‖ p_{l} - p_{m} ‖}^{2}}{2 σ^{2}})

(8)

where σ is a positive regularization parameter and is empirically set as 1 for his simplicity, and ${| p_{l} - p_{m} |}^{2}$ denotes the location distance between $x_{l}$ (corresponding to $X_{i_{1} i_{2} \dots i_{N}}$ ) and $x_{m}$ (corresponding to $X_{{i^{'}}_{1} {i^{'}}_{2} \dots {i^{'}}_{N}}$ ), which can be defined as follows:

{‖ p_{l} - p_{m} ‖}^{2} = \sqrt{\sum_{k = 1}^{N} {(i_{k} - {i^{'}}_{k})}^{2}}

(9)

Once obtaining the metric matrix G according to (8), we eventually get the following data-adaptive tensor distance metric:

\begin{array}{l} d_{t d} (X, Y) = \sqrt{\sum_{l, m = 1}^{m_{1} \times m_{2} \times \dots \times m_{N}} g_{l m} (x_{l} - y_{l}) (x_{m} - y_{m})} \\ = \sqrt{\frac{1}{2 π σ^{2}} \sum_{l, m = 1}^{m_{1} \times m_{2} \times \dots \times m_{N}} \exp (- \frac{{‖ p_{l} - p_{m} ‖}^{2}}{2 σ^{2}}) (x_{l} - y_{l}) (x_{m} - y_{m})} \end{array}

(10)

It is important to note that the above data-adaptive tensor distance metric can be reduced to the traditional Euclidean distance when $G = I$ . Therefore, the traditional Euclidean distance is a special case of the proposed data-adaptive tensor distance metric.

3.3. Distance adaptive tensor discriminative geometry preserving projection

In order to preserve the intrinsic tensor structure of high-order face image data, a natural way is to perform dimensionality reduction in their original high-order tensor space. In the following, we discuss how to tensorize the discriminative geometry preserving projection which gives rise to a distance adaptive tensor discriminative geometry preserving projection (DATDGPP) algorithm for face recognition.

Given n face image samples $X_{1}, X_{2}, \dots, X_{n}$ in the tensor space $ℝ^{m_{1} \times m_{2} \times \dots m_{N}}$ and each $X_{j}$ ( $j = 1, \dots, n$ ) belongs to the class indexed as $c_{j} \in {1, 2, \dots, n_{c}}$ , DATDGPP aims to find N transformation matrices $U_{i} \in ℝ^{m_{i} \times {m^{'}}_{i}}$ ( ${m^{'}}_{i} ≪ m_{i}, i = 1, \dots, N$ ) that maps each high-dimensional data point $X_{j}$ ( $j = 1, \dots, n$ ) to a lower-dimensional data point $Y_{j} \in ℝ^{{m^{'}}_{1} \times {m^{'}}_{2} \times \dots {m^{'}}_{N}}$ ( $j = 1, \dots, N$ ) by $Y_{j} = X_{j} \times_{1} U_{1} \times_{2} \dots \times_{N} U_{N}$ such that $Y_{j}$ represents $X_{j}$ well in terms of discrimination preservation and local geometry preservation.

Similar to the original vector-based DGPP method, to preserve both the local geometry and the discriminative information in the low-dimensional feature subspace, the optimal objective function of DATDGPP is defined as follows:

\arg \max_{U_{k} |_{k = 1}^{N}} (\begin{array}{l} \sum_{i, j = 1}^{n} H_{i j} {‖ X_{i} \times_{1} U_{1} \times_{2} \dots \times_{N} U_{N} - X_{j} \times_{1} U_{1} \times_{2} \dots \times_{N} U_{N} ‖}^{2} \\ - \sum_{i = 1}^{n} {‖ X_{i} \times_{1} U_{1} \times_{2} \dots \times_{N} U_{N} - \sum_{j : c_{i} = c_{j}} W_{i j} X_{i} \times_{1} U_{1} \times_{2} \dots \times_{N} U_{N} ‖}^{2} \end{array})

(11)

with the constraint

U_{k} ​^{T} U_{k} = I (k = 1, 2, \dots, N)

(12)

where the weighting factor $H_{i j}$ encodes both the distance weighting information and the class label information in the high-order tensor space according to

H_{i j} = {\begin{cases} \exp (- d_{t d}^{2} (X_{i}, X_{j}) / δ^{2}) (1 / n - {1 / n}_{l}), if c_{i} = c_{j} = l \\ 1 / n,, if c_{i} = c_{j} \end{cases}

(13)

where δ is a positive constant and its setting can be referred to in [5], $d_{t d} (X_{i}, X_{j})$ denotes the tensor distance metric between two tensor $X_{i}$ and $X_{j}$ defined in (10), and $W_{i j}$ is the reconstruction weight in the tensor space, which can be obtained by solving the following reconstruction error minimization problem:

\begin{array}{l} W_{i j} = \arg \min_{W_{i j}} \sum_{i = 1}^{n} {‖ ε_{i} ‖}^{2} \\ = \arg \min_{W_{i j}} \sum_{i = 1}^{n} {‖ X_{i} - \sum_{j : c_{j} = c_{i}} W_{i j} X_{j} ‖}^{2} \end{array}

(14)

Similar to the original DGPP method, by imposing $\sum_{j : c_{j} = c_{i}} W_{i j} = 1$ and $W_{i j} = 0$ for $c_{i} \neq c_{j}$ on the optimal problem (14), we can easily obtain $W_{i} = \sum_{p} C_{i, p}^{- 1} / \sum_{p, q} C_{p, q}^{- 1}$ , wherein $C_{p, q} = {(X_{i} - X_{p})}^{T} (X_{i} - X_{q})$ is the local Gram matrix in the tensor space and $c_{p} = c_{q} = c_{i}$ .

Because of the difficulty in computing the optimal $U_{1}, U_{2}, \dots, U_{N}$ simultaneously for (11), we propose an iterative algorithm inspired by previous research [10]. In this algorithm, we first initialize the transformation matrices $U_{1}, U_{2}, \dots, U_{k - 1}, U_{k + 1}, \dots, U_{N}$ , then the optimal transformation matrix $U_{k}$ can be solved from the other projection matrices $U_{1}, U_{2}, \dots, U_{k - 1}, U_{k + 1}, \dots, U_{N}$ by

\begin{array}{l} U_{k}^{*} = \arg \max_{U_{k}} (\sum_{i \neq j}^{n} H_{i j} {‖ U_{k} X_{i}^{k} - U_{k} X_{j}^{k} ‖}^{2} - \sum_{i = 1}^{n} {‖ U_{k} X_{i}^{k} - \sum_{j : c_{j} = c_{i}} W_{i j} U_{k} X_{i}^{k} ‖}^{2}) \\ = \arg \max_{U_{k}} (\begin{array}{l} T r (U_{k}^{T} X_{i}^{k} (D - H^{T}) {(D - H^{T})}^{T} {(X_{i}^{k})}^{T} U_{k}) \\ - T r (U_{k}^{T} X_{i}^{k} (I - W^{T}) {(D - W^{T})}^{T} {(X_{i}^{k})}^{T} U_{k}) \end{array}) \\ = \arg \max_{U_{k}} T r (U_{k}^{T} X_{i}^{k} (\begin{array}{l} (D - H^{T}) {(D - H^{T})}^{T} \\ - (I - W^{T}) {(D - W^{T})}^{T} \end{array}) {(X_{i}^{k})}^{T} U_{k}) \end{array}

(15)

with the constraint

U_{k} ​^{T} U_{k} = I

(16)

where

$X_{i}^{k} = X_{i} \times_{1} U_{1} \dots \times_{k - 1} U_{k - 1} \times_{k + 1} U_{k + 1} \dots \times_{N} U_{N}$ and $X^{k}$ is the k-mode unfolding of a tensor X.

Since the optimal problem of (15) subject to (16) can be approximately solved by the following standard eigenvalue decomposition problem:

X_{i}^{k} ((D - H^{T}) {(D - H^{T})}^{T} - (I - W^{T}) {(D - W^{T})}^{T}) {(X_{i}^{k})}^{T} U_{k} = λ_{k} U_{k}

(17)

then the optimal transformation matrix $U_{k}$ can also be regarded as the eigenvectors of the matrix $X_{i}^{k} ((D - H^{T}) {(D - H^{T})}^{T} - (I - W^{T}) {(D - W^{T})}^{T}) {(X_{i}^{k})}^{T}$ associated with the largest eigenvalues. Therefore, the optimal objective function of (11) can be solved by iteratively optimizing different transformation matrices while fixing the other transformation matrices.

The resolving procedure of the above iteration algorithm can be described as follows: first, we fix $U_{2}, \dots, U_{n}$ , and obtain the optimal transformation matrix $U_{1}$ by computing the largest eigenvectors of $X_{i}^{1} ((D - H^{T}) {(D - H^{T})}^{T} - (I - W^{T}) {(D - W^{T})}^{T}) {(X_{i}^{1})}^{T}$ .

Then we fix $U_{1}, U_{3}, \dots, U_{n}$ , and obtain the optimal projection matrix $U_{2}$ by computing the largest eigenvectors of $X_{i}^{2} ((D - H^{T}) {(D - H^{T})}^{T} - (I - W^{T}) {(D - W^{T})}^{T}) {(X_{i}^{2})}^{T}$ . The rest can be computed by analogy. Finally, we can obtain the optimal projection matrix $U_{n}$ by fixing $U_{1}, U_{2}, \dots, U_{n - 1}$ . We repeat the above iteration procedure until algorithm converges. The algorithmic procedure of DATDGPP can be summarized as follows:

Input:

Face image sample set $X_{1}, X_{2}, \dots, X_{n} \in ℝ^{m_{1} \times m_{2} \times \dots m_{N}}$ and their class labels $c_{1}, c_{2}, \dots, c_{n} \in {1, 2, \dots, n_{c}}$ , the maximal iteration number $T_{\max}$ , and the final low-dimensions ${m^{'}}_{1}, {m^{'}}_{2}, \dots, {m^{'}}_{N}$ .

Output:

Transformation matrices $U_{k} \in ℝ^{m_{k} \times {m^{'}}_{k}} (k = 1, 2, \dots, N)$ .

1: Construct the adaptive tensor distance metric in terms of (10);

2: Construct the weighting factor matrix H in terms of (13);

3: Obtain the reconstruction error W by solving the function (14);

4: Construct the optimal objective function of DATDGPP in terms of (11) and (12);

5: Initialize iteration number $t = 0$ and transformation matrices $U_{1}^{0} = I_{m_{1}}, U_{2}^{0} = I_{m_{2}}, \dots, U_{N}^{0} = I_{m_{N}}$ ;

6: For $t = 1, \dots, T_{\max}$ do

7: For $k = 1, 2, \dots, N$ do

8: Compute

$χ_{i}^{k} = X_{i} \times_{1} U_{1}^{t} \dots \times_{k - 1} U_{k - 1}^{t} \times_{k + 1} U_{k + 1}^{t - 1} \dots \times_{N} U_{N}^{t - 1};$

9: Compute the k-mode unfolding of a tensor $χ_{i}^{k}$ , i.e., $X_{i}^{k} \Leftarrow_{k} χ_{i}^{k}$ ;

10: Compute

$Q^{k} = X_{i}^{k} ((D - H^{T}) {(D - H^{T})}^{T} - (I - W^{T}) {(D - W^{T})}^{T}) {(X_{i}^{k})}^{T};$

11: Solve the eigenproblem: $Q^{k} U_{k}^{t} = λ_{k} U_{k}^{t}$ , $U_{k}^{t} \in ℝ^{m_{k} \times {m^{'}}_{k}}$ ;

12: If $t > 2$ and $| U_{k}^{t} - U_{k}^{t - 1} | < {m^{'}}_{k} m_{k} ε$ , $k = 1, 2, \dots, N$ , break;

13: Out the final transformation matrices $U_{k} = U_{k}^{t} \in ℝ^{m_{k} \times {m^{'}}_{k}}$ , $k = 1, 2, \dots, N$ .

In Algorithm 1, we described in detail the procedure for learning the transformation matrices in an iterative manner. In the following, we analyse the computational complexity of DATDGPP and prove the convergence of the proposed iteration resolving algorithm.

To easily analyse the computational complexity, we assume that the high-order tensor data have the same size in each dimension, i.e., $m_{i} = m$ , $i = 1, 2, \dots, N$ . Therefore, the time complexity of the original vector-based DGPP algorithm is $O (m^{3 N})$ . However, for our proposed DATDGPP algorithm, the time complexity in computing the scatter matrices is $O (N \times m^{(N + 1)})$ and the time complexity for solving the eigenvalue decomposition is $O (N \times m^{3})$ for each loop, which is much lower than the original vector-based DGPP algorithm. Although the DATDGPP algorithm has no closed-form solution and many loops are needed to reach convergence, it still runs much faster than the original DGPP algorithm due to its simplicity in each iteration.

Theorem 1. The iterative algorithm for DATDGPP will converge.

Proof: to prove the convergence of the proposed iterative algorithm, we only need to prove that the objective function is non-decreasing and has an upper bound. To conveniently describe the proof procedure, we rewrite the objective function in (11) as follows:

J (U_{1}^{t}, U_{2}^{t}, \dots, U_{N}^{t}) = \arg \max (\begin{array}{l} \sum_{i, j = 1}^{n} H_{i j} {‖ X_{i} \times_{1} U_{1}^{t} \times_{2} \dots \times_{N} U_{N}^{t} - X_{j} \times_{1} U_{1}^{t} \times_{2} \dots \times_{N} U_{N}^{t} ‖}^{2} \\ - \sum_{i = 1}^{n} {‖ X_{i} \times_{1} U_{1}^{t} \times_{2} \dots \times_{N} U_{N}^{t} - \sum_{j : c_{i} = c_{j}} W_{i j} X_{i} \times_{1} U_{1}^{t} \times_{2} \dots \times_{N} U_{N}^{t} ‖}^{2} \end{array})

In fact, in each iteration of Algorithm 1, since each update of transformation matrix $U_{k}$ maximizes the current objective function while other matrices $U_{i}$ $(i = 1, 2, \dots, k - 1, k + 1, \dots, N)$ are fixed, the objective function in (11) is non-decreasing, i.e.,

\begin{array}{l} J (U_{1}^{t}, U_{2}^{t}, \dots, U_{N}^{t}) \leq J (U_{1}^{t + 1}, U_{2}^{t}, \dots, U_{N}^{t}) \\ \leq J (U_{1}^{t + 1}, U_{2}^{t + 1}, \dots, U_{N}^{t}) \\ \leq \dots \leq J (U_{1}^{t + 1}, U_{2}^{t + 1}, \dots, U_{N}^{t + 1}) \end{array}

In addition, an upper bound exists for the objective function in (11), i.e.,

J (U_{1}^{t}, U_{2}^{t}, \dots, U_{N}^{t}) \leq {‖ X_{i} ‖}^{2} | (D - H^{T}) {(D - H^{T})}^{T} - (I - W^{T}) {(D - W^{T})}^{T} | .

Therefore, the objective function $J (U_{1}^{t}, U_{2}^{t}, \dots, U_{N}^{t})$ is non-decreasing and has an upper bound, i.e., the proposed iterative algorithm for DATDGPP will finally converge.

It is worth noting that our proposed DATDGPP algorithm can be reduced to the tensor DGPP(TDGPP) algorithm which adopts the traditional Euclidean distance metric when the tensor distance $d_{t d} (X_{i}, X_{j})$ in (13) adopts the traditional Euclidean distance metric, i.e., $d_{t d} (X_{i}, X_{j}) = | X_{i} - X_{j} |$ . In fact, we have shown that the traditional Euclidean distance is a special case of our proposed data-adaptive tensor distance metric in the last paragraph of Section 3.2. In addition, since the TDGPP algorithm is similar to our proposed DATDGPP algorithm, except for the tensor distance computation in (13), we omit the detailed algorithmic procedure of TDGPP for simplicity.

3.4. Face recognition using DATDGPP

Once the transformation matrix of DATDGPP is obtained, we can apply it to project the face images into a low-dimensional subspace. Then the face recognition problem becomes a pattern classification task and the traditional classification algorithms can be applied in the low-dimensional subspace. In this paper, we apply the nearest-neighbour classifier because of its simplicity.

With the learned transformation matrices $(U_{k} |_{k = 1}^{N})$ in Algorithm 1, the low-dimensional feature representation of the face training sample $X_{i}$ , $(i = 1, 2, \dots, n)$ , can be computed according to

Y_{i} = X_{i} \times_{1} U_{1} \times_{2} U_{2} \dots \times_{N} U_{N} .

(18)

When a new testing face image data X arrives, we can first obtain its low-dimensional feature representation according to

Y = X \times_{1} U_{1} \times_{2} U_{2} \dots \times_{N} U_{N} .

(19)

Then the class label of X is predicted to be that of a face sample whose low-dimensional feature representation is nearest to Y, that is

i^{*} = \arg \min_{i} ‖ Y_{i} - Y ‖

(20)

and the face image X is classified to the class $c_{i^{*}}$ , where $| Y_{i} - Y |$ denotes the tensor distance between the two tensors $Y_{i}$ and Y.

4. Experimental results

In this section we first compare our proposed DATDGPP algorithm with the original vector-based DGPP algorithm and the tensor DGPP(TDGPP) algorithm which adopts the traditional Euclidean distance metric. Then, we compare the proposed DATDGPP algorithm with tensor principal component analysis (TPCA) [12], tensor linear discriminant analysis (TLDA) [10], tensor locality preserving projection (TLPP) [13], and tensor marginal Fisher analysis (TMFA) [8] - four of the most popular tensor-based dimensionality reduction algorithms in face recognition.

4.1. Face databases

Our empirical study on face recognition was conducted on three real-world face databases: the Yale database, the Olivetti Research Laboratory (ORL) database and the PIE (pose, illumination and expression) database from CMU. In all the experiments, preprocessing to locate the faces was applied. Original images were manually aligned according to the eye position, cropped and normalized to the resolution of $112 \times 92$ . In addition, in order to reduce the influence of some extreme illumination, histogram equilibrium was applied in the preprocessing step. Face images in the three databases are naturally second-order tensors.

In all the experiments the recognition process has three steps. First, we calculate the face subspace from the training samples; then the new face image to be identified is projected into d-dimensional subspace for the vector-based algorithms or $(d \times d)$ -dimensional subspace for the tensor-based algorithms; finally, the new face image is identified by the nearest-neighbour classifier in the low-dimensional subspace.

The Yale database

(http://cvc.yale.edu/projects/yalefaces/yalefaces.html) contains 165 front view face images of 15 individuals. Eleven images were collected from each individual with varying facial expressions and configurations. Eleven sample images of one person from the Yale database are shown in Figure 1.

Figure 1.

Face image examples from the Yale database

The ORL face database

(http://www.uk.research.att.com/facedatabase.html) contains 400 images of 40 individuals. Some images were captured at different times and have different variations including expression, lighting and facial details. The images were taken with a tolerance for some tilting and rotation of the face up to 20 degrees. Ten sample images of one person from the ORL database are shown in Figure 2.

Figure 2.

Face image examples from the ORL database

The CMU PIE face database [17] contains 41,368 face images of 68 subjects in total. The face images were captured by 13 synchronized cameras and 21 flashes, under varying pose, illumination and expression. In this work, we use a subset of five near front poses (C05, C07, C09, C27 and C29) and five illuminations (indexed as 08 and 11). Therefore, each person has ten images. Figure 3 shows ten example images of one person from the PIE database.

Figure 3.

Face image examples from the PIE database

4.2. Results

We conducted two experiments on each database. In each experiment, the face image set was randomly partitioned into the training and testing set with different numbers. For ease of representation, the experiments are named as Gm/Pn which means that m images per person are randomly selected for training and the remaining n images for testing. We repeat each experiment 20 times on randomly selected training and testing sets and report the average results.

In the first experiments we compare the recognition accuracy and running time of DGPP, TDGPP and DATDGPP algorithms under different training and testing partitions. In general, the performance of all these algorithms varies with the number of dimensions. We show the maximal average recognition accuracy as well as the optimal reduced dimension and the running time obtained by DGPP, TDGPP and DATDGPP algorithms on the three databases in Tables 1–6. In the second experiment we also compare our proposed DATDGPP algorithm with the other four representative tensor-based dimensionality reduction algorithms: TPCA, TLDA, TLPP and TMFA. Tables 1, 3, 5 report the maximal average recognition accuracies and the corresponding optimal reduced dimensions of the TPCA, TLDA, TLPP, TMFA and DATDGPP algorithms on the three databases. Due to space limitations, we omit the plots of recognition accuracy versus variation of reduced dimensions on the three databases.

Table 1.

Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the Yale database

Algorithm	G2/P9	G3/P8	G4/P7	G5/P6
DGPP	63.5(12)	67.8(20)	74.6(15)	77.5(15)
TDGPP	86.4(14×14)	88.2(13×13)	90.3(10×10)	92.7(10×10)
TPCA	61.3(29×29)	69.7(27×27)	76.9(23×23)	81.2(23×23)
TLDA	70.4(26×26)	75.4(22×22)	82.6(18×18)	85.1(18×18)
TLPP	73.2(20×20)	78.6(20×20)	83.5(17×17)	87.3(16×16)
TMFA	84.7(16×16)	86.2(14×14)	89.3(12×12)	91.2(12×12)
DATDGPP	89.1(12×12)	90.5(12×12)	93.4(8×8)	94.6(8×8)

Table 2.

Comparisons of running time (second) on the Yale database

Algorithm	G2/P9	G3/P8	G4/P7	G5/P6
DGPP	2.58	2.79	3.62	4.34
TDGPP	0.32	0.38	0.45	0.49
DATDGPP	0.31	0.36	0.42	0.47

Table 3.

Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the ORL database

Algorithm	G2/P8	G3/P7	G4/P6	G5/P5
DGPP	87.2(37)	90.4(39)	92.7(46)	96.4(42)
TDGPP	93.5(8×8)	95.8(6×6)	96.5(6×6)	98.7(6×6)
TPCA	77.8(30×30)	86.9(27×27)	90.7(26×26)	93.2(26×26)
TLDA	87.2(10×10)	91.6(8×8)	94.3(8×8)	97.8(8×8)
TLPP	86.3(14×14)	90.7(12×12)	93.8(12×12)	97.4(12×12)
TMFA	90.5(12×12)	95.3(10×10)	96.4(10×10)	98.5(10×10)
DATDGPP	95.6(8×8)	97.5(6×6)	98.8(6×6)	99.6(6×6)

Table 4.

Comparisons of running time (second) on the ORL database

Algorithm	G2/P8	G3/P7	G4/P6	G5/P5
DGPP	2.36	2.65	3.49	4.25
TDGPP	0.29	0.34	0.42	0.46
DATDGPP	0.27	0.33	0.40	0.45

Table 5.

Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the CMU PIE database

Algorithm	G2/P8	G3/P7	G4/P6	G5/P5
DGPP	64.7(50)	72.5(54)	85.6(68)	88.3(70)
TDGPP	78.6(12×12)	83.8(10×10)	87.3(10×10)	90.5(10×10)
TPCA	56.5(27×27)	67.6(25×25)	75.8(23×23)	82.4(23×23)
TLDA	64.7(10×10)	73.4(8×8)	80.9(8×8)	86.1(8×8)
TLPP	65.9(16×16)	74.5(14×14)	81.7(14×14)	86.5(14×14)
TMFA	76.4(14×14)	81.3(12×12)	85.2(12×12)	89.3(10×10)
DATDGPP	82.3(10×10)	85.2(9×9)	89.6(9×9)	91.8(9×9)

Table 6.

Comparisons of running time (second) on the CMU PIE database

Algorithm	G2/P8	G3/P7	G4/P6	G5/P5
DGPP	4.57	5.29	6.13	6.96
TDGPP	0.43	0.51	0.58	0.62
DATDGPP	0.42	0.49	0.57	0.61

The main observations from the above performance comparisons include:

1) The tensor-based DGPP algorithms (i.e., TDGPP and DATDGPP) perform much better than the original vector-based DGPP algorithm, which demonstrates that the tensor structure representation can effectively use the correlation among different coordinates of face image data to enhance the recognition performance.

2) Our proposed DATDGPP algorithm consistently outperforms the traditional Euclidean distance-based TDGPP algorithm. This is because the orthogonality assumption of the traditional Euclidean distance may not reflect the real distance between two tensor representation-based face images, while our proposed data-adaptive tensor distance can effectively reflect the spatial relationships between pixels.

3) The running times of the tensor-based DGPP algorithms (i.e., TDGPP and DATDGPP) are much smaller than the original vector-based DGPP algorithm. This result is consistent with the above analysis of computational complexity, i.e., the computational complexities of the tensor-based DGPP algorithms are much lower than the original vector-based DGPP algorithm.

4) The TPCA algorithm performs the worst among the compared algorithms. A possible explanation is as follows: similar to the traditional PCA, the TPCA is also unsupervised, it simply achieves object reconstruction and it is not necessarily useful for discrimination and classification tasks.

5) The average performance of TLDA and TLPP are similar. For some databases, TLDA outperforms TLPP, while TLPP is better than TLDA for other databases. This demonstrates that it is hard to evaluate whether local manifold structure or class label information is more important.

6) The TMFA algorithm performs better than TLDA and TLPP algorithms in all experiments. This observation indicates the importance of utilizing both class label information and local manifold structure, as well as describing the separability of different classes with margin criterion. However, the performance of TMFA is still inferior to DATDGPP. The reason is as follows: TMFA extracts discriminative information from only marginal points, although non-marginal points also contain the discriminative information.

7) Our proposed DATDGPP algorithm consistently outperforms TPCA, TLDA, TLPP and TMFA algorithms. The main reason could be attributed to the following fact: first, DATDGPP can precisely model both the intraclass geometry and interclass discrimination; second, the proposed data-adaptive tensor distance can effectively reflect the spatial relationships between pixels. Therefore, our proposed DATDGPP algorithm achieves the best performance among the compared algorithms by the combination of the adaptive tensor distance and discriminative geometry preserving projection strategy.

5. Conclusion and future work

In this paper, we proposed a novel distance adaptive tensor discriminative geometry preserving projection (DATDGPP) algorithm for face recognition. The advantages of DATDGPP are as follows: DATDGPP can preserve the natural tensor structure of face images, it adopts the data-adaptive tensor distance to model the correlation among different coordinates of tensor data and its transformation matrix is of discrimination preservation and local geometry preservation ability. Experiments have demonstrated the effectiveness of the proposed algorithm compared with the original DGPP algorithm and other representative tensor-based dimensionality reduction algorithms. Since the proposed DATDGPP algorithm is a general tensor dimensionality reduction algorithm for high-order tensor data, we plan to apply the algorithm to other high-order tensor data (such as video data) classification in the future.

Footnotes

6. Acknowledgments

This work is supported by the National Natural Science Foundation of China under grant no.70701013,the Natural Science Foundation of Henan Province under grant no.102300410020,the National Science Foundation for Post-doctoral Scientists of China under grant no. 2011M500035 and the Specialized Research Fund for the Doctoral Programme of Higher Education of China under grant no.20110023110002.

References

Duda

R.O.

Hart

P. E.

Stork

D.G.

, Pattern Classification, Second edition. Hoboken, NJ, USA: Wiley-Interscience, 2000.

Lee

D.D.

Seung

H.S.

, “Algorithms for nonnegative matrix factorization,” in Proceedings of Advances in Neural Information Processing Systems, 2000, pp.556–562.

Tenenbaum

J.B.

de Silva

Langford

J.C.

, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol.290, pp.2319–2323, 2000.

Roweis

S.T.

Saul

L.K.

, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol.290, pp.2323–2326, 2000.

Belkin

Niyogi

, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol.15, pp.1373–1396, 2003.

Yan

Niyogi

Zhang

H.-J.

, “Face recognition using Laplacian faces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, pp.328–340, 2005.

Niyogi

, “Locality preserving projections,” in Proceedings of Advances in Neural Information Processing Systems, 2003, pp.585–591.

Yan

Zhang

H.-J.

Yang

Lin

, “Graph embedding and extensions: A general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, pp.40–51, 2007.

Song

Tao

, “Biologically inspired feature manifold for scene classification,” IEEE Transactions on Image Processing, vol.19, pp.174–184, 2010.

10.

Yan

Yang

Zhang

Tang

Zhang

H.-J.

, “Multilinear discriminant analysis for face recognition,” IEEE Transactions on Image Processing, vol.16, pp. 212–220, 2007.

11.

Yan

, “Semi-supervised bilinear subspace learning,” IEEE Transactions on Image Processing, vol.18, 1671–1676, 2009.

12.

Plataniotis

K.N.

Venetsanopoulos

A.N.

, “MPCA: Multilinear principal component analysis of tensor objects,” IEEE Transactions on Neural Networks, vol.19, pp.18–39, 2008.

13.

Cai

Niyogi

, “Tensor subspace analysis,” in Proceedings of Advances in Neural Information Processing Systems, 2005, pp.1–8.

14.

Liu

Chan

K.C.C.

, “Tensor distance based multilinear locality-preserved maximum information embedding,” IEEE Transactions on Neural Networks, vol.21, pp.1848–1854, 2010.

15.

Wang

Zhang

Feng

, “On the Euclidean distance of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, pp.1334–1339, 2005.

16.

Lin

Yan

, “Discriminant locally linear embedding with high-order tensor data,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol.38, pp.342–352, 2008.

17.

Sim

Baker

Bsat

, “The CMU pose, illumination, and expression database,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, pp.1615–1618, 2003.

Distance Adaptive Tensor Discriminative Geometry Preserving Projection for Face Recognition

Abstract

Keywords

1. Introduction

2. Brief review of DGPP

3. Distance adaptive tensor discriminative geometry preserving projection for face recognition

3.1. Review of tensor operations

3.2. Data-adaptive tensor distance

3.3. Distance adaptive tensor discriminative geometry preserving projection

3.4. Face recognition using DATDGPP

4. Experimental results

4.1. Face databases

4.2. Results

5. Conclusion and future work

Footnotes

6. Acknowledgments

References