Abstract
1. Introduction
In the last few decades, face recognition has attracted more and more attention in the field of computer vision and patter recognition [1-3]. As one of most successful applications in biometrics, face recognition can be applied in social robotics to fulfill the person identification task in a natural and non-contact way. In practice, face patterns are subject to changes in illumination, pose, facial expression, etc. Among them, face recognition with the real disguise is a very important and hard problem to be solved. Therefore, robust vision-based face recognition has been extensively studied by researchers from the area of computer vision, robotics, artificial intelligence, etc.
Generally, face image is stretched into a high dimensional face vector, then feature extraction and dimensionality reduction algorithms can be applied in the face space, so that the high-dimensional face vector is transformed into a low-dimensional subspace. And in this face subspace classification and identification task can be implemented. Two classical linear face recognition methods are principal component analysis (PCA) [4] and linear discriminate analysis (LDA) [5]. PCA is widely used to reduce the dimensionality of original face images, and the extracted Eigenface features are used as inputs for other methods. LDA is a supervised subspace learning method, which seeks the optimal projection directions to maximize the between-class scatter and minimize the with-class scatter at the same time. The typical nonlinear methods are kernel methods based on the linear ones, which apply the kernel transformation to enhance the classifying ability, for example, see [6, 7]. The other nonlinear methods are manifold learning algorithms, e.g. locally linear embedding (LLE) [8] and locality preserving projection (LPP) [9], which assume that the distribution of face image data is close to manifolds embedded in the high-dimensional space.
In 2007, graph embedding (GE) [10] was proposed as a general framework to unify a series of the dimensionality reduction algorithms for face recognition. Each algorithm can be considered as a certain kind of graph embedding, in which the specific graph is designed to describe a certain statistical or geometric property of a data set. According to GE, marginal fisher analysis (MFA) [10] and neighborhood discriminant embedding (NDE) [11] are gradually proposed. These algorithms can better reveal the representative and discriminative features from the underlying manifold structures of face image.
Recently, sparse representation is introduced from compressive sensing theory into the field of pattern recognition; the sparse representation-based classification (SRC) [12] is a landmark algorithm for robust face recognition, which can deal with face occlusion, corruption and real disguise. The basic idea of SRC is to represent the query face image using a small number of atoms parsimoniously chosen out of an over-complete dictionary which consists of all training samples. The sparsity constraint of the coding coefficients is employed to insure that only a few samples from the same class of the query face have distinct nonzero values, whereas the coefficients of other samples are equal or close to zero. The sparsity of the coding coefficient can be directly measured by
The representation fidelity of SRC is measured by the

Five objects from AR Databse, the first line is five samples without disguise, the second line is five samples with sunglasses, and the third line is five samples with scarf.
In RSC, iteratively reweighted regularized robust coding (IR3C) [15] algorithm is proposed to solve the MLE of the coding problem. Usually the number of iterations is more than 10, and then IR3C can obtain the convergence result. To improve the efficiency of the implementation of the algorithm and increase the robustness of RSC dealing with real face disguise, in this paper we propose an improved robust sparse coding (iRSC) algorithm. In each step of iteration, the dictionary, which consists of all training samples, is reduced by eliminating the objects with larger coding residuals. The reduced dictionary is used to obtain the convergence result of the MLE solution of the sparse coding problem. By eliminating the interference of the objects with larger coding residual errors, iRSC is fast convergence and more efficient. Our experiments in AR face database [16] show that iRSC achieves better performance that SRC and RSC when dealing with real face disguise.
The rest of this paper is organized as follows: Section 2 reviews the algorithms of SRC and RSC. Section 3 presents our proposed iRSC. Section 4 conducts the experiments, and Section 5 concludes the paper.
2. Reviews of SRC and RSC
In this section, we review two sparse representation-based face recognition algorithms. Given a face image sample from a certain face database, its storage format is
2.1 Sparse Representation-based Classification (SRC)
In SRC, the over-complete dictionary consists of all training samples,
where ‖•‖0 is the
When SRC deals with face occlusion and corruption, it introduces an identity matrix
According to [17], Eq. (3) is equivalent to the Lagrangian formulation:
Here SRC uses
The classification criterion of SRC is to find which class of training samples can better represent the query sample,
where
2.2 Robust Sparse Coding (RSC)
In SRC, the sparse representation fidelity is actually measured by the
where
where
The element
where
IR3C algorithm can solve RSC model, however its convergence result usually needs more that 10 steps, and each step has high computational complexity. In order to reduce the computing consumption and enhance the robustness, we propose the improved robust sparse coding (iRSC) algorithm presented in the next section.
3. The improved Robust Sparse Coding (iRSC)
In both SRC and RSC, all training samples are involved to compose the over-complete dictionary, each query sample is represented as a sparse linear combination over the dictionary. The sparsity constraint on the coding coefficients and the iteratively solving algorithm make the computational cost of RSC very high. Since the dictionary is over-complete for sparse representation, for example AR database has 100 classes and even more classes in practical systems, the dictionary can be reduced in the iterative steps to calculate the weight matrix
After the step

(a) The size reduction curve of the dictionary on AR database. (b) The convergence curves of RSC and iRSC, the difference of
Algorithm of the improved robust sparse coding
where
4. Experimental Results
In this paper, we focus on face recognition with real disguise. Therefore we conduct our experiments on AR face database [16] in which there are samples with sunglasses or scarf, see Figure 1. We compare iRSC with SRC [12] and RSC [14] which are the benchmark methods using sparse representation for face recognition. A subset of the AR database is used in the experiments, which consists of 600 images (6 non-occluded frontal view samples per class, 3 from Session 1 and 3 from Session 2) from 100 subjects (50 males and 50 females) for training and 200 images (2 samples per class, with sunglasses or scarf) from 100 subjects for testing. Figure 3 shows 6 training samples with facial expression changes and 2 testing samples with neutral expression from the first subject in AR database.

(a) Six training samples and (b) two testing samples from the first object from AR Databse.
The images are resized to 42 × 30, the parameters μ and δ are set the same as in [15]. And the regularization parameter λ is set as 0.001 by default. Fig. 4(a) shows a test image with sunglasses; Fig. 4(b) is the training sample associated with the maximum of coding entries; Fig. 4(c) and Fig. 4(d) show the minimum of residuals and the final weight map by RSC and iRSC, respectively.

An example of face recognition with disguise using RSC and iRSC. (a) A test image with sunglasses. (b) The training sample associated with the maximum of coding entries by both RSC and iRSC. (c) and (d) are the minimum of residuals
In Figure 5, (a) and (b) are the sparse coding of the test sample and the residuals of each class by RSC. Only one training sample has the maximum of coding entries, others are close to zero; only the residual associated with the identified subject is close to zero, others are very large. (c) and (d) are the ones by iRSC, respectively. As a result of the size reduction of dictionary, the coding becomes sparser, while the same results are achieved as well.

(a) The sparse coding of the test sample by RSC, the identified sample is laid out. (b) The residuals of each class by RSC. (a) The sparse coding of the test sample by iRSC. (b) The residuals of each class by iRSC.
The face recognition results by SRC, RSC and iRSC are listed in Table 2. Although the dictionary is reduced in iRSC, it still can achieve competitive recognition rates with RSC dealing with both sunglasses and scarf disguises. SRC did not get good performance with scarf (only 38% accuracy) in which about 40% face region are covered. The reason is that SRC cannot handle the case with large occlusion more than around 30%.
Recognition rates by competing methods on the AR database with disguise occlusion.
In the experiments, the programming environment is Matlab 7.0a. The computer used is of 3.10 GHz Intel(R) Core(TM) i5–2400 CPU and with 4.00 GB RAM. Average runtimes by the above three methods are listed in Table 3. As a result of the size reduction of dictionary, the average runtime of iRSC is much shorter than both SRC and RSC. Since
Average runtimes by competing methods on the AR database with disguise occlusion
Here, we conduct a more challenging task that the testing samples have more facial expressions. Another subset of the AR database is used in the experiment, which consists of 700 images (7 non-occluded frontal view samples per class). The testing dataset consists of 600 images (each class has 6 samples with sunglasses or scarf). Other parameters are the same as the first experiment. The face recognition accuarcies and average runtime by SRC, RSC and iRSC are listed in Table 4. The iRSC with fixed ratio can achieve better results than the one with Eq. (10) in this case, while RSC still get the highest accuracy among them. Also the different ways to choose the retention factor
Recognition rates and average runtime (in parentheses and unit is second) by competing methods on the AR database with disguise occlusion.
5. Conclusion
This paper presented an improved robust sparse coding (iRSC) alogrithm for robust face recognition with real disguise. The advantages of RSC that are of robustness to various types of outliers and large region occlusion have been well preserved. By the size reduction of dictionary in each iterative step of iRSC, its computational complexity is reduced significantly. Its average runtime is only about 16% of RSC. In this process, the over-complete property of dictionary is not affected, therefore, iRSC can still achieve competitive recognition rates with RSC. The experimental results on AR face database demonstrated that iRSC has better comprehensive performance than SRC and RSC. With high recognition rate but low computational cost, iRSC is a good candicate for practical robotic systems to fulfill robust face recognition tasks.
