Sage Journals: Discover world-class research

Abstract

Robust vision-based face recognition is one of most challenging tasks for robots. Recently the sparse representation-based classification (SRC) has been proposed to solve the problem. All training samples without disguise are used to compose an over-complete dictionary, and the testing sample with disguise is represented by the dictionary with a sparse coding coefficients plus an error. The coding residuals between the sample and each class of training samples are measured and the minimum of them is the identified class to which the sample belongs. The robust sparse coding (RSC) seeks for the MLE (maximum likelihood estimation) solution of the sparse coding problem, so it is more robust to disguise. However, the iteratively algorithm to solve RSC is high time consuming. In this paper, we propose an improved robust sparse coding (iRSC) algorithm for practical application conditions. During iterations, the dictionary is reduced by eliminating the objects with larger coding residuals. The over-complete property of dictionary is not affected. Experiments on AR face database demonstrate that the coding is sparser and the efficiency is higher in iRSC.

Keywords

Face Recognition Sparse Representation Robust Sparse Coding Disguise

1. Introduction

In the last few decades, face recognition has attracted more and more attention in the field of computer vision and patter recognition [1 -3]. As one of most successful applications in biometrics, face recognition can be applied in social robotics to fulfill the person identification task in a natural and non-contact way. In practice, face patterns are subject to changes in illumination, pose, facial expression, etc. Among them, face recognition with the real disguise is a very important and hard problem to be solved. Therefore, robust vision-based face recognition has been extensively studied by researchers from the area of computer vision, robotics, artificial intelligence, etc.

Generally, face image is stretched into a high dimensional face vector, then feature extraction and dimensionality reduction algorithms can be applied in the face space, so that the high-dimensional face vector is transformed into a low-dimensional subspace. And in this face subspace classification and identification task can be implemented. Two classical linear face recognition methods are principal component analysis (PCA) [4] and linear discriminate analysis (LDA) [5]. PCA is widely used to reduce the dimensionality of original face images, and the extracted Eigenface features are used as inputs for other methods. LDA is a supervised subspace learning method, which seeks the optimal projection directions to maximize the between-class scatter and minimize the with-class scatter at the same time. The typical nonlinear methods are kernel methods based on the linear ones, which apply the kernel transformation to enhance the classifying ability, for example, see [6, 7]. The other nonlinear methods are manifold learning algorithms, e.g. locally linear embedding (LLE) [8] and locality preserving projection (LPP) [9], which assume that the distribution of face image data is close to manifolds embedded in the high-dimensional space.

In 2007, graph embedding (GE) [10] was proposed as a general framework to unify a series of the dimensionality reduction algorithms for face recognition. Each algorithm can be considered as a certain kind of graph embedding, in which the specific graph is designed to describe a certain statistical or geometric property of a data set. According to GE, marginal fisher analysis (MFA) [10] and neighborhood discriminant embedding (NDE) [11] are gradually proposed. These algorithms can better reveal the representative and discriminative features from the underlying manifold structures of face image.

Recently, sparse representation is introduced from compressive sensing theory into the field of pattern recognition; the sparse representation-based classification (SRC) [12] is a landmark algorithm for robust face recognition, which can deal with face occlusion, corruption and real disguise. The basic idea of SRC is to represent the query face image using a small number of atoms parsimoniously chosen out of an over-complete dictionary which consists of all training samples. The sparsity constraint of the coding coefficients is employed to insure that only a few samples from the same class of the query face have distinct nonzero values, whereas the coefficients of other samples are equal or close to zero. The sparsity of the coding coefficient can be directly measured by l₀-norm, which counts the number of nonzero entries in a vector. However, the l₀-norm minimization is an NP-hard problem; therefore the l₁-norm minimization is widely employed instead of the above problem. It has been demonstrated that l₀-norm and l₁-norm minimizations are equivalent if the solution is sufficiently sparse [13].

The representation fidelity of SRC is measured by the l₂-norm of the coding residual, which actually assumes that the coding residual follows Gaussian distribution. It may not be able to effectively describe the real model of the coding residual in practical situation of face recognition, especially dealing with real face disguise, for example face with sunglasses or scarf, see Figure 1. The robust sparse coding (RSC) [14, 15] seeks for the MLE (maximum likelihood estimation) solution of the sparse coding problem, so that the distribution of coding residual is more accurate than Gaussian or Laplacian, and it is more robust to disguise than SRC.

Figure 1.

Five objects from AR Databse, the first line is five samples without disguise, the second line is five samples with sunglasses, and the third line is five samples with scarf.

In RSC, iteratively reweighted regularized robust coding (IR³C) [15] algorithm is proposed to solve the MLE of the coding problem. Usually the number of iterations is more than 10, and then IR³C can obtain the convergence result. To improve the efficiency of the implementation of the algorithm and increase the robustness of RSC dealing with real face disguise, in this paper we propose an improved robust sparse coding (iRSC) algorithm. In each step of iteration, the dictionary, which consists of all training samples, is reduced by eliminating the objects with larger coding residuals. The reduced dictionary is used to obtain the convergence result of the MLE solution of the sparse coding problem. By eliminating the interference of the objects with larger coding residual errors, iRSC is fast convergence and more efficient. Our experiments in AR face database [16] show that iRSC achieves better performance that SRC and RSC when dealing with real face disguise.

The rest of this paper is organized as follows: Section 2 reviews the algorithms of SRC and RSC. Section 3 presents our proposed iRSC. Section 4 conducts the experiments, and Section 5 concludes the paper.

2. Reviews of SRC and RSC

In this section, we review two sparse representation-based face recognition algorithms. Given a face image sample from a certain face database, its storage format is M×N color or gray image. The face image is stretched into a d-dimensional face vector x (d=M × N). Then face recognition algorithms can directly applied in the d-dimensional face space.

2.1 Sparse Representation-based Classification (SRC)

In SRC, the over-complete dictionary consists of all training samples, i.e., D = [D₁,D₂,…,D_k] ∈ ℝ ^d×n , where the ith object class D_i = [x_i,1, x_i,2,…,x_{i,n_i}], and n = n₁ + n₂ +… + n_k, k is the number of all classes. The query sample y ∈ ℝ ^d can be represented by the dictionary, i.e., y = Dα, where α is the coding vector of y over D. Therefore, the sparse coding problem can be formulated as: $\min_{α} {‖ α ‖}_{0} s . t . y = D α,$ (1)

where ‖•‖₀ is the l₀-norm. Since the sparsest solution of Eq. (1) is an NP-hard problem, SRC uses the equivalent l₁-norm minimization to replace l₀-norm minimization under the condition that the solution is sufficiently sparse [13]. The sparse coefficient can be obtained by the following regularization: $\hat{α} = \arg \min_{α} {{‖ y - D α ‖}_{2}^{2} + λ {‖ α ‖}_{1}} .$ (2)

When SRC deals with face occlusion and corruption, it introduces an identity matrix I as a dictionary to code the outlier pixels. Therefore, Eq. (2) can be extended as follow: $\hat{α} = \arg \min_{α} {{‖ y - [D, I] [α; β] ‖}_{2}^{2} + λ {‖ [α; β] ‖}_{1}} .$ (3)

According to [17], Eq. (3) is equivalent to the Lagrangian formulation: $\hat{α} = \arg \min_{α} {{‖ y - D α ‖}_{1} + λ {‖ α ‖}_{1}} .$ (4)

Here SRC uses l₁-norm to model the coding residual y – Dα, so that it can gain certain robustness to outliers.

The classification criterion of SRC is to find which class of training samples can better represent the query sample, $identity (y) = \arg \min_{i} {{‖ y - D_{i} {\hat{α}}_{i} ‖}_{2}} o r \arg \min_{i} {{‖ y - \hat{β} - D_{i} {\hat{α}}_{i} ‖}_{2}},$ (5)

where ${\hat{α}}_{i}$ is the sub-coding vector associated with ith class of training samples.

2.2 Robust Sparse Coding (RSC)

In SRC, the sparse representation fidelity is actually measured by the l₂-norm or l₁-norm of the coding residual errors, i.e., ${‖ y - D α ‖}_{2}^{2}$ in Eq. (2) or ${‖ y - D α ‖}_{1}$ in Eq. (4). The sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution, respectively. In practical situation, both distributions may not be effective enough to describe the coding residual, especially when dealing with real face disguise. Therefore, RSC assumes a more suitable distribution for the coding residual e = y – Dα, as follow: $ρ_{θ} (e) = - \frac{1}{2 μ} (\ln (1 + \exp (- μ e^{2} + μ δ)) - \ln (1 + \exp μ δ)),$ (6)

where ρ_θ(e) = –ln f_θ(e), f_θ(e) is the probability density function (PDF) of e, θ denotes the unknown parameter set that characterizes the distribution, μ and δ are positive scalars, μ controls the decreasing rate from 1 to 0, and δ controls the location of demacation point. The RSC model is as follow: $\hat{α} = \arg \min_{α} {{‖ W^{1 / 2} (y - D α) ‖}_{2}^{2} + λ {‖ α ‖}_{1}},$ (7)

where W is the estimated diagonal weight matrix, and its diagnonal elements are as follow: $W_{i, i} = ω_{θ} (e_{i}) = \frac{{ρ^{'}}_{θ} (e_{i})}{e_{i}} = \frac{1}{1 + \exp (μ e_{i}^{2} - μ δ)} .$ (8)

The element W_i,i is the weight assigned to pixel i of the query image y, and the outlier pixels (e.g., occlusion or corruption) have small weights to reduce their effect on the sparse coding since they have large coding residual errors. The same classification strategy as in SRC is used to classify the query face image: $identity (y) = \arg \min_{i} {{‖ W_{f i n a l}^{1 / 2} (y - D_{i} {\hat{α}}_{i}) ‖}_{2}},$ (9)

where W_final is the convergent final weight matrix.

IR³C algorithm can solve RSC model, however its convergence result usually needs more that 10 steps, and each step has high computational complexity. In order to reduce the computing consumption and enhance the robustness, we propose the improved robust sparse coding (iRSC) algorithm presented in the next section.

3. The improved Robust Sparse Coding (iRSC)

In both SRC and RSC, all training samples are involved to compose the over-complete dictionary, each query sample is represented as a sparse linear combination over the dictionary. The sparsity constraint on the coding coefficients and the iteratively solving algorithm make the computational cost of RSC very high. Since the dictionary is over-complete for sparse representation, for example AR database has 100 classes and even more classes in practical systems, the dictionary can be reduced in the iterative steps to calculate the weight matrix W. Actually, the objects with larger coding residual errors have less contribution to representing the query sample, and the coding coefficients associated with those objects are usually equal or close to zero under the sparsity constraint. Therefore those objects could be omitted from the dictionary without loss of the over-complete property. The iRSC algorithm is proposed according to the above principle. At the beginning, more irrelevant objects can be omitted to reduce the total computing cost. When the over-complete dictionary is small enough, all remained objects should be reserved to keep the condition for well sparse representation. Thus, we define a retention factor R of the dictionary for the step t as follow: $R_{t} = {\begin{matrix} 0.1 t + 0.5, t \leq 5 \\ 1, t > 5 \end{matrix}$ (10)

After the step t, only R_t × 100% of the dictionary with minor coding residual errors can be reserved for the next step. Moreover we can set the retention factor R with a fixed ratio or median ratio. Fig. 2 (a) is the size reduction curve of the over-complete dictionary on AR database. Although the size of dictionary is reduced gradually, the convergence of iRSC is almost the same as RSC, see Fig. 2 (b), which means that the over-complete property of dictionary is almost not affected.

Figure 2.

(a) The size reduction curve of the dictionary on AR database. (b) The convergence curves of RSC and iRSC, the difference of W is defined as ‖W^(t+1) – W^(t)‖₂/‖W^(t)‖₂.

The algorithm of iRSC is presented in Table 1, in step 6 the condition of convergence is as follow:

Table 1.

Algorithm of the improved robust sparse coding

The improved Robust Sparse Coding (iRSC)
1. Input: Normalized query image y with unit l₂-norm, dictionary D (each column of D has unit l₂-norm); $D^{(1)} = D, α^{(1)} = {[\frac{1}{n}, \frac{1}{n}, …, \frac{1}{n}]}^{T}$ . Start from t = 1.
2. Calculate residual e^(t) = y – D^(t)α^(t) and estimate weight $W_{i, i}^{(t)} = ω_{θ} (e_{i}^{(t)}) = \frac{1}{1 + exp (μ {(e_{i}^{(t)})}^{2} - μ δ)}$ .
3. Solve the l₁-minimization problem: $\hat{α} = \underset{α}{arg min} {{‖ {(W^{t})}^{1 / 2} (y - D^{(t)} α) ‖}_{2}^{2} + λ {‖ α ‖}_{1}} .$
4. Calculate residual between y and ŷ_i represented by only the ith class: $r_{i} (y) = {‖ {(w^{(t)})}^{1 / 2} (y - D_{i}^{(t)} {\tilde{α}}_{i}) ‖}_{2}$ , where D_i^(t) is the sub-dictionary associated with the ith class and $\hat{α_{i}}$ is the sub-coding vector associated with the ith class.
5. Remain R_t × 100% classes of the dictionary D^(t) with smaller residuals r_i(y) : D^(t+1) =R_t(D^(t)). Update the sparse coding coefficients: $α^{(t + 1)} = R_{t} (\hat{α})$ , which is a new vector whose entries are the entries in $\hat{α}$ that are associated with the remained classes. Let t = t + 1.
6. Go back to step 2 until the condition of convergence is met, or the maximal number of iterations is reached.
7. Output: $identity (y) = \underset{i}{arg min r_{i} (y)}$ .

{‖ W^{(t + 1)} - W^{(t)} ‖}_{2} / {‖ W^{(t)} ‖}_{2} < δ_{W},

(11)

where δ_W is a small positive scalar. The maximal number of iterations is usually set as 10 in our experiments on AR database.

4. Experimental Results

In this paper, we focus on face recognition with real disguise. Therefore we conduct our experiments on AR face database [16] in which there are samples with sunglasses or scarf, see Figure 1. We compare iRSC with SRC [12] and RSC [14] which are the benchmark methods using sparse representation for face recognition. A subset of the AR database is used in the experiments, which consists of 600 images (6 non-occluded frontal view samples per class, 3 from Session 1 and 3 from Session 2) from 100 subjects (50 males and 50 females) for training and 200 images (2 samples per class, with sunglasses or scarf) from 100 subjects for testing. Figure 3 shows 6 training samples with facial expression changes and 2 testing samples with neutral expression from the first subject in AR database.

Figure 3.

(a) Six training samples and (b) two testing samples from the first object from AR Databse.

The images are resized to 42 × 30, the parameters μ and δ are set the same as in [15]. And the regularization parameter λ is set as 0.001 by default. Fig. 4(a) shows a test image with sunglasses; Fig. 4(b) is the training sample associated with the maximum of coding entries; Fig. 4(c) and Fig. 4(d) show the minimum of residuals and the final weight map by RSC and iRSC, respectively.

Figure 4.

An example of face recognition with disguise using RSC and iRSC. (a) A test image with sunglasses. (b) The training sample associated with the maximum of coding entries by both RSC and iRSC. (c) and (d) are the minimum of residuals r_i(y) and the final weight map by RSC and iRSC, respectively.

In Figure 5, (a) and (b) are the sparse coding of the test sample and the residuals of each class by RSC. Only one training sample has the maximum of coding entries, others are close to zero; only the residual associated with the identified subject is close to zero, others are very large. (c) and (d) are the ones by iRSC, respectively. As a result of the size reduction of dictionary, the coding becomes sparser, while the same results are achieved as well.

Figure 5.

(a) The sparse coding of the test sample by RSC, the identified sample is laid out. (b) The residuals of each class by RSC. (a) The sparse coding of the test sample by iRSC. (b) The residuals of each class by iRSC.

The face recognition results by SRC, RSC and iRSC are listed in Table 2. Although the dictionary is reduced in iRSC, it still can achieve competitive recognition rates with RSC dealing with both sunglasses and scarf disguises. SRC did not get good performance with scarf (only 38% accuracy) in which about 40% face region are covered. The reason is that SRC cannot handle the case with large occlusion more than around 30%.

Table 2.

Recognition rates by competing methods on the AR database with disguise occlusion.

Algorithms	Sunglasses	Scarves
SRC	87%	38%
RSC	100%	99%
iRSC (Eq. (10))	98%	99%
iRSC (fixed ratio 0.8)	100%	98%

In the experiments, the programming environment is Matlab 7.0a. The computer used is of 3.10 GHz Intel(R) Core(TM) i5–2400 CPU and with 4.00 GB RAM. Average runtimes by the above three methods are listed in Table 3. As a result of the size reduction of dictionary, the average runtime of iRSC is much shorter than both SRC and RSC. Since l₁_l_s [18] l₁-minimization solver is used in all the methods, the empirical computational complexity of SRC is O(n²m^1.3) where n is the dimensionality of face feature, and m is the number of dictionary atoms. When dealing with occlusion, its complexity is O(n²(m + n)^1.3) because it needs to add an identity matrix to code the occluded pixels. The complexity of RSC is about O(tn²m^1.3), t = 10 in this case. As the size reduction of dictionary in iRSC, its runtime is just about 16% of RSC.

Table 3.

Average runtimes by competing methods on the AR database with disguise occlusion

Algorithms	Sunglasses	Scarves
SRC	17.86 s	20.09 s
RSC	28.32 s	23.35 s
iRSC (Eq. (10))	4.43 s	4.03 s
iRSC (fixed ratio 0.8)	5.85 s	4.80 s

Here, we conduct a more challenging task that the testing samples have more facial expressions. Another subset of the AR database is used in the experiment, which consists of 700 images (7 non-occluded frontal view samples per class). The testing dataset consists of 600 images (each class has 6 samples with sunglasses or scarf). Other parameters are the same as the first experiment. The face recognition accuarcies and average runtime by SRC, RSC and iRSC are listed in Table 4. The iRSC with fixed ratio can achieve better results than the one with Eq. (10) in this case, while RSC still get the highest accuracy among them. Also the different ways to choose the retention factor R are investigated. Using median ratio for the retention factor will delete too many training samples and the over-complete property of the dictionary may be weaken. Although the average runtime is the lowest, its accuracy is much lower. So it's not a good choice. The basic principle to choose R is to preserve the over-complete property of the dictionary.

Table 4.

Recognition rates and average runtime (in parentheses and unit is second) by competing methods on the AR database with disguise occlusion.

Algorithms	Sunglasses	Scarves
SRC	71.17%(19.4)	26.33%(20.4)
RSC	88.17%(32.7)	88.50%(22.6)
iRSC (Eq. (10))	82.00%(5.02)	81.83%(4.55)
iRSC (fixed ratio 0.9)	87.17%(5.58)	85.83%(4.85)
iRSC (median ratio)	79.00%(1.42)	69.00%(1.29)

5. Conclusion

This paper presented an improved robust sparse coding (iRSC) alogrithm for robust face recognition with real disguise. The advantages of RSC that are of robustness to various types of outliers and large region occlusion have been well preserved. By the size reduction of dictionary in each iterative step of iRSC, its computational complexity is reduced significantly. Its average runtime is only about 16% of RSC. In this process, the over-complete property of dictionary is not affected, therefore, iRSC can still achieve competitive recognition rates with RSC. The experimental results on AR face database demonstrated that iRSC has better comprehensive performance than SRC and RSC. With high recognition rate but low computational cost, iRSC is a good candicate for practical robotic systems to fulfill robust face recognition tasks.

Footnotes

6. Acknowledgments

This work is supported by grants from National Natural Science Foundation of China (No. 61105021),China Postdoctoral Science Foundation (No. 2011M501442) and the Fundamental Research Funds for the Central Universities.

References

Bowyer

K.W.

Chang

Flynn

(2006) A survey of approaches and challenges in 3D and multi-modal 3D+2D face recognition. Computer Vision and Image Understanding. 101(1): 1–15.

Abate

A.F.

Nappi

Riccio

. (2007) 2D and 3D face recognition: A survey. Pattern Recognition Letters. 28(14): 1885–1906.

Zhao

Chellappa

Rosenfeld

. (2003) Face Recognition: A Literature Survey. ACM Computing Surveys. 399–458.

Turk

Pentland

(1991) Eigenfaces for recognition. Cognitive Neuroscience. 3: 71–86.

Belhumeur

P.N.

Hepanha

J.P.

Kriegman

D.J.

(1997) Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE. Trans. Pattern Analysis and Machine Intelligence. 19(7): 711–720.

Scholkopf

Smola

Muller

K.R.

(1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10: 1299–1319.

Mika

Ratsch

Weston

(1999) Fisher discriminant analysis with kernels. Proc. IEEE Neural Networks for Signal Processing. 41–48.

Roweis

Saul

L.K.

(2000) Nonlinear dimensionality reduction by locally linear embedding. Science. 290(5500): 2323–2326.

X.F.

Yan

S.C.

Y.X.

. (2005) Face recognition using Laplacianfaces. IEEE Trans. Pattern Analysis and Machine Intelligence. 27: 328–340.

10.

Yan

S.C.

Zhang

B.Y.

. (2007) Graph embedding and extensions: A general framework for dimensionality reduction. IEEE. Trans. Pattern Analysis and Machine Intelligence. 29(1): 40–51.

11.

Zhong

Han

Zhang

. (2010) Neighborhood discriminant embedding in face recognition. Optical Engineering. 49(7): 077203.

12.

Wright

Yang

A.Y.

Ganesh

. (2009) Robust Face Recognition via Sparse Representation. IEEE. Trans. Pattern Analysis and Machine Intelligence. 31(2): 210–227.

13.

Donoho

(2006) For Most Large Underdetermined Systems of Linear Equations the Minimal l₁-Norm Solution is also the Sparsest Solution. Comm. Pure and Applied Math. 59(6): 797–829.

14.

Yang

Zhang

Yang

. (2011) Robust sparse coding for face recognition. Proc. IEEE Computer Vision and Pattern Recognition (CVPR).

15.

Yang

Zhang

Yang

. (2012) Regularized Robust Coding for Face Recognition. Available: http://arxiv.org/abs/1202.4207v2.

16.

Martinez

Benavente

(1998) The AR face database. CVC Tech. Report No. 24.

17.

Yang

Zhang

(2009) Alternating direction algorithms for l₁-problems in compressive sensing. Technical report, Rice University.

18.

Kim

S.J.

Koh

Lustig

. (2007) A interior-point method for large-scale l₁-regularized least squares. IEEE Journal on Selected Topics in Signal Processing. 1(4): 606–617.