Sage Journals: Discover world-class research

Abstract

Face recognition via representation-based classification is a trending technique in the recent years. However, the recognition performance of the systems using such a technique degrades in an unconstrained environment. In this article, a novel framework is proposed for representation-based face recognition. To deal with the unconstrained environment, a pre-process is used to frontalize face images, and aligned downsampling local binary pattern features of the frontalized images are used for classification. A dimension reduction is then adopted in order to reduce the computation complexity via an optimized projection matrix. The recognition is carried out using an improved robust sparse coding algorithm. Such an algorithm is expected to avoid the overfitting problem. The open-universe test on labeled faces in the wild data sets shows that the recognition rate of the proposed system can reach 95% with a recall rate of 80%, which is best among those representation-based classification face recognition systems.

Keywords

Face recognition alignment downsampling local binary pattern robust sparse coding projection matrix optimization nonnegative sparse representation

Introduction

Face recognition (FR), a popular area of research in computer vision and machine learning, has been widely studied for decades.^1,2 The automatic FR systems, including those using representation classification (RC) techniques such as sparse RC (SRC),³ have already achieved very impressive performance over large-scale images in the constrained environment (cooperative users with controlled indoor illumination).^4,5 It remains a challenge on how to enhance the performance of the SRC-based FR systems in an unconstrained environment. As SRC is an non-deterministic polynomial (NP)-hard combinatorial problem, it is usually relaxed into an l₁-norm-based convex problem, which can be solved with an iterative shrinkage/thresholding algorithms.⁶ Recently, it is found that a sparser solution can be obtained by solving its non-convex relaxation rather than by solving its convex relaxation.⁷ However, solving the non-convex relaxation is usually very costly, which limits its applications in real time.

It has been shown that the nonnegative coefficient constrained SRC is more effective.^6,8 This is mainly due to the fact that this approach can avoid overfitting. The algorithms proposed in the Bioucas-Dias and Figueiredo study⁶ can be expanded to a finite number of terms, which collectively resemble the typical neural network layers. Consequently, a lengthy sequence of iterations can be treated as a deep learning (DL) network with shared layer weights,⁹ while the rectified linear unit (ReLU) actually corresponds to the nonnegative coefficient constraint. As well known, however, DL is computationally very expensive.

It should be pointed out that approaches using the representation-based classification techniques cannot be directly used for FR in an unconstrained environment. An algorithm of face alignment by sparse and low-rank matrix decomposition (SLMD) was proposed in Wu et al.¹⁰ However, it is not applicable for large-scale image data set. The conventional modeling of face alignment is based on affine transform with two-dimensional (2D) model face detection and registration.^11,12 Recently, a face frontalization framework was proposed with three-dimensional (3D) reference model in Hassner et al.,¹³ which is more effective in FR and gender estimation. A multi-view FR based on tensor subspace analysis was proposed by Gao and Tian,¹⁴ but it is very difficult to apply in the uncontrolled environment. The mirror face image information was adopted in Xu et al.¹⁵ to improve the recognition performance; this improvement is, however, very limited as the multi-view face is not frontalized. A robust sparse coding (RSC) method was proposed by Yang et al.,¹⁶ which is the maximum likelihood estimation (MLE) solution of the sparse coding problem. However, the sparse coding algorithm for RSC, involving iterative re-weighting, is costly for a large-scale data set.

In this article, we propose a 3D model–based robust local nonnegative sparse representation classification (RLNSRC), which is intended to deal with FR in uncontrolled scenarios. The main contributions in this article are given as follows:

Instead of using the raw face image, a 3D frontalization based on the aligned downsampling local binary pattern (ADLBP) feature is adopted. This allows us to deal with the uncontrolled environments effectively.

A compressive sensing–based RLNSRC (CS-RLNSRC) scheme is proposed for FR. An algorithm is derived for designing the optimal projection matrix that is used to compress the ADLBP feature signals. Such a system is intended to reduce the computation complexity and to prevent the overfitting problem.

The article is outlined as follows. The “Related works and problem formulation” section is devoted to providing some existing works on FR systems using the representation-based classification, which are closely related to ours, to be presented in this article. Our main contribution is given in “A novel framework for FR” section, in which a CS-RLNSRC framework is proposed. Experiments are carried out in the “Experiment results” section to examine the performance of the proposed system. To end this article, some concluding remarks are given in the “Conclusion” section.

Linear regression classification

In the linear regression classification (LRC), the coefficient vectors are obtained from the following minimization

${\hat{s}}_{j} = \arg min_{s} ∥ x - Ψ_{j} s ∥_{2}^{2}, \forall j$ (5)

As $N > L_{j}$ usually holds, equation (5) is well conditioned, and hence ${\hat{s}}_{j}$ can be obtained with

${\hat{s}}_{j} = {(Ψ_{j}^{T} Ψ_{j})}^{- 1} Ψ_{j}^{T} x$ (6)

The corresponding classification deviations ${σ_{j}^{2}}$ can be computed using equation (3) and the classification can be done using equation (4).

Sparse representation–based classification

The SRC assumes that $x$ can be well represented by a sparse vector $s$ , whose nonzero elements correspond to one class only. In that case

$\hat{s} ≜ \arg min_{s} | | s | |_{0} s . t . x = Ψ s$ (7)

where $| | v | |_{0}$ denotes the number of nonzero elements in vector $v$ . Finding the solution to equation (7) is NP-hard as it is a non-convex optimization problem. Suboptimal solutions to this problem can be found by matching pursuit or orthogonal matching pursuit. Such a problem can be regularized by replacing the $l_{0}$ norm in equation (7) with the $l_{1}$ norm as follows

$\hat{s} ≜ \arg min_{s} ∥ s ∥_{1} s . t . x = Ψ s$ (8)

It can be shown¹⁷ that if certain conditions are satisfied, the solution of l₀-based problem is equivalent to equation (8), while the latter can be efficiently solved by convex optimization such as the l₁/l₂-based algorithm.¹⁸

Similarly, the corresponding classification deviations ${σ_{j}^{2}}$ to the obtained $\hat{s}$ can be computed using equation (3) and the classification can be done using equation (4).

Robust sparse coding

In Yang et al.,¹⁶ a robust representation–based classification approach was proposed. Denote the residual of signal representation as

$ξ ≜ x - Ψ s$

Assume that $ξ (1), ξ (2), \dots, ξ (N)$ are of an independent identity distribution with the probability density function $f (ξ) = Π_{n = 1}^{N} f_{0} (ξ_{n})$ . Maximizing this likelihood function is equivalent to minimizing the objective function

$F (ξ) ≜ - \ln [f (ξ)] = - \sum_{n = 1}^{N} \ln [f_{0} (ξ_{n})] ≜ \sum_{n = 1}^{N} ρ_{0} (ξ_{n})$ (9)

The optimized sparse vector can be solved by

$\hat{s} ≜ \arg min_{s} F (ξ) s . t . | | s | |_{0} \leq κ$ (10)

As it is a non-convex problem, Yang et al.¹⁶ proposed an iterative method which is described as follows.

Define

$\tilde{F} (ξ) ≜ F (ξ_{k}) + {(ξ - ξ_{k})}^{T} F' (ξ_{k}) + \frac{1}{2} {(ξ - ξ_{k})}^{T} R_{k} (ξ - ξ_{k})$ (11)

where $F' (ξ_{k}) ≜ d F (ξ) / d ξ |_{ξ = ξ_{k}}$ , $R_{k}$ is a diagonal matrix, and equation (11) can be viewed as the first-order Taylor approximation of $F (ξ)$ evaluated at $ξ = ξ_{k}$ . Both $\tilde{F} (ξ)$ and $F (ξ)$ reach the minimum at $ξ = 0$ . It can be shown that

$R_{k} (n, n) = \frac{F' (ξ_{k}) (n)}{ξ_{k} (n)}, \forall n$ (12)

and hence

$\tilde{F} (ξ) = \frac{1}{2} ξ^{T} R_{k} ξ + F (ξ_{k}) - ξ_{k}^{T} F' (ξ_{k}) + \frac{1}{2} ξ_{k}^{T} R_{k} ξ_{k}$

Replace $\tilde{F} (ξ)$ with $F (ξ)$ , then the solution to the corresponding equation (10) is equivalent to

${\hat{s}}^{(k)} ≜ \arg min_{s} | | W_{k} (x - Ψ s) | |_{2}^{2} s . t . | | s | |_{0} \leq κ$ (13)

where $R_{k} = W_{k}^{2}$ .

$W$ can be updated with equation (12) using $ξ_{k + 1} = x - Ψ {\hat{s}}^{(k)}$ . Let us define $\hat{W} ≜ lim_{k \to + \infty} W_{k}$ , $\hat{s} ≜ lim_{k \to + \infty} {\hat{s}}^{(k)}$ . Here, $σ_{j}^{2}$ , instead of using equation (3), is evaluated with

$σ_{j}^{2} ≜ ∥ \hat{W} (x - Ψ_{j} {\hat{s}}_{j}) ∥_{2}^{2}, \forall j$ (14)

and $x$ is then classified using equation (4).

To reduce the complexity in updating $W$ with equation (12), the following updating was proposed in Yang et al.¹⁶

$W_{k} (n, n) ≜ \frac{e^{μ δ - μ ξ_{k}^{2} (n)}}{1 + e^{μ δ - μ ξ_{k}^{2} (n)}}, \forall n$ (15)

where $μ$ and $δ$ are two positive constants and $ξ_{0} ≜ x - \bar{x}$ , where $\bar{x}$ is the mean of all training samples.

Nonnegative coefficient sparse representation–based classification

The nonnegative coefficient sparse representation–based classification is a constrained SRC, formulated with

${\hat{s}}_{j} ≜ \arg min_{s} | | s | |_{0} s . t . x = Ψ_{j} s, s \geq 0, \forall j$ (16)

As understood, this is a non-convex problem and is hard to be solved. However, when the nonnegative sparse solution of equation (16) exists and is unique, it is equivalent to nonnegative linear regression.¹⁹ Therefore, equation (16) is alternated to

${\hat{s}}_{j} ≜ \arg min_{s_{j}} ∥ x - Ψ_{j} s_{j} ∥_{2}^{2} s . t . s_{j} \geq 0, \forall j$ (17)

which can be solved by the nonnegative least squares function lsqnonneg.m in MATLAB.²⁰

The classification deviations ${σ_{j}^{2}}$ and the classification are the same as the LRC presented before.

Problem formulation

It should be pointed out that all the listed representation–based classification approaches work well only for the constrained environment. The environment in real life is usually unconstrained, and hence a pre-process such as frontalization is definitely needed. Also, the RSC algorithm presented above is very slow due to the step of updating the weighting $W$ . Therefore, a simplification is required. As there exists a lot of redundancy in a raw face image, which affects the recognition, it is desired to use the important features of the image for recognition. Therefore, we will propose a novel FR scheme that, besides a frontalization stage, consists of a local binary pattern (LBP) feature extraction, compressive sensing-based dimension reduction, and classification using robust nonnegative coefficient SRC. Such a system is referred as CS-RLNSRC.

A novel framework for FR

Figure 1 depicts the proposed FR scheme. Such a system consists of four main modules that will be described in the following subsections.

Figure 1.

The proposed CS-RLNSRC face recognition system.

Image pre-process

In this stage, we just concentrate on the process of face frontalization. The method used here is similar to that proposed by Hassner et al.¹³ The basic idea is to extract face features that are used to match a 3D face mode and then frontalize the matched model to get the front face. See Figures 2 and 3, which show the six original faces (of the same person) and those frontalized using the process, respectively. In the traditional representation–based classification, one would put many different unfrontalized face images into the dictionary in order to deal with the unconstrained environment. Thus, the frontalization process can not only handle complicated environments but also avoid increasing the size of the dictionary as it contains the front face images only.

Figure 2.

Six images from LFW data set.

Figure 3.

The six frontalized face images.

Feature extraction

After frontalization, the images are processed using a downsampling LBP algorithm to reduce the illumination and misalignment effect.²¹ Let $Y_{DLBP}$ denote the matrix representing an image that is the output of this algorithm. An SLMD is applied to eliminate the face deformation in $Y_{DLBP}$

$\begin{matrix} Y ≜ \arg min_{\tilde{Y}, E} {rank [\tilde{Y}] + ∥ E ∥_{0}} \\ s . t . Y_{DLBP} = \tilde{Y} + E \end{matrix}$

where $E$ is the deformable residual and $Y$ is the ADLBP feature data matrix of the input face image.

The augmented Lagrange multiplier method can be used in the SLMD process.²²

Dimension reduction

The obtained feature matrix $Y$ is re-arranged into a vector $y ≜ vec [Y] \in ℜ^{M \times 1}$ by stacking all the column vectors of $Y$ . As M is usually very big, recognition directly using $y$ is sometimes impossible. Therefore, a dimension reduction is needed. This can be achieved with a projection

$x = Φ y$ (18)

where $Φ \in ℜ^{N \times M}$ is called projection matrix.

The recognition can then be carried out much efficiently using $x$ . Classically, a random projection matrix is adopted. Haupt et al.²³ gave the following result in evaluating the performance of a classifier using a random projection.

Theorem 1

Let $S_{y} ≜ {y_{l}}_{l = 1}^{L}$ be a set of L samples with $y_{l} \in ℜ^{M \times 1}$ and $| | y_{l} | |_{2}^{2} = 1, \forall l$ . Assume that the elements of $Φ \in ℜ^{N \times M}$ are of i.i.d. with $N (0, 1 / N)$ and

$x_{l} = Φ y_{l} + e_{l}, \forall l$ (19)

where $e_{l}$ is the noise in the measurement domain with the assumption that all the elements of $e_{l}$ are i.i.d. following $N (0, 1 / N)$ . Let ${\hat{y}}_{l}$ be the estimate of $y_{l}$ , obtained with

${\hat{y}}_{l} ≜ \arg min_{y \in S_{y}} | | x_{l} - Φ y | |_{2}^{2}$ (20)

Then, the false acceptance rate (FAR) probability bound of this estimation is given by

$\Pr ({\hat{y}}_{l} \neq y_{l}) \leq (L - 1) {(1 + \frac{ρ_{\min}}{4 σ^{2} M})}^{- N / 2}$ (21)

where $ρ_{\min} ≜ min_{k \neq l} {ρ_{E} (y_{k}, y_{l})}$ .

In the representation-based classification framework, the ADLBP feature signal $y$ of an image is considered to be of form

$y = As$

where $s \in ℜ^{L \times 1}$ is the coefficient vector and $A \in ℜ^{M \times L}$ is the dictionary in ADLBP feature domain. Combining it with equation (18), one has

$x = Φ As ≜ Ψ s$ (22)

where $Ψ = Φ A$ is the dictionary of the images in the measurement domain with $A$ formed using the ADLBP feature vectors of the training face images.

In the classical SRC-based FR framework, the signals are the original (front) face images, while in the proposed FR scheme the signals are the compressed version $x$ of the ADLBP feature vectors $y$ of these face images, obeying equation (22).

The compressive sensing theory suggests that an optimized projection/sensing matrix can outperform significantly the random one in terms of keeping the information contained in the high-dimensional signals and enhancing the recovery accuracy.^24,25 The problem of optimizing projection matrix was first proposed in Elad²⁴ and a large class of existing algorithms is of the following formulation²⁵

$\underset{min}{Φ} ∥ G - G_{t} ∥_{F}^{2} s . t . G = Ψ^{T} Ψ$ (23)

where $∥ \cdot ∥_{F}$ is the Frobenius norm, $G$ is the Gram matrix of the dictionary defined in (equation (22)) $Ψ ≜ Φ A$ with $A$ the ADLBP domain dictionary assumed to be given, and $G_{t}$ is a target Gram matrix. This formulation is based on the argument that reducing the mutual coherence of the equivalent dictionary can enhance the signal recovery accuracy in a compression-oriented compressive sensing system.²⁵ One algorithm for designing the optimal $Φ$ differs from another mainly in the choice of $G_{t}$ that belongs to a set of symmetric matrices having its diagonal elements all equal to one and the others absolutely smaller than one.

Denote $G_{A}$ as the Gram of the ADLBP dictionary $A = [\begin{matrix} A_{1} & \dots & A_{j} & \dots & A_{J} \end{matrix}]$ , that is

$\begin{matrix} G_{A} ≜ A^{T} A \\ = [\begin{matrix} {\bar{G}}_{11} & {\bar{G}}_{12} & \dots & {\bar{G}}_{1 J} \\ {\bar{G}}_{21} & {\bar{G}}_{22} & \dots & {\bar{G}}_{2 J} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\bar{G}}_{J 1} & {\bar{G}}_{J 2} & \dots & {\bar{G}}_{JJ} \end{matrix}] \end{matrix}$

where ${\bar{G}}_{ij} ≜ A_{i}^{T} A_{j}$ .

In our proposed system, the target Gram $G_{t}$ is chosen as

$G_{t} = Δ \cdot G_{A}$ (24)

where

$Δ ≜ [\begin{matrix} Δ_{11} & Δ_{12} & \dots & Δ_{1 J} \\ Δ_{21} & Δ_{22} & \dots & Δ_{2 J} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Δ_{J 1} & Δ_{J 2} & \dots & Δ_{JJ} \end{matrix}]$

with $Δ_{ij}$ defined as

$Δ_{ij} (m, n) = {\begin{matrix} 1 - η, & i \neq j \\ 1 + η, & i = j, m \neq n \\ 1, & i = j, m = n \end{matrix}$ (25)

where $0 \leq η \leq 1$ is adopted to adjust the inter-class and intra-class correlation.

The motivation for such a choice of target Gram is explained below. We note that with the same partition as $G_{t}$ , the sub-matrix $G_{ij}$ of the Gram of $Ψ$ with $i \neq j$ represents the cross-correlation between two different classes of subjects, while $G_{jj}$ reflects the auto-correlation between the jth class. From the viewpoint of classification, it is desired to design the projection matrix $Φ$ such that the elements in $G_{ij}$ are as big as possible and those in $G_{jj}$ are close to zero. By doing so, the discrimination and hence the classification are expected to enhance.

With $G_{t}$ given above, the optimal sensing matrix design problem (equation (23)) can be solved using the algorithm derived in Yu et al.²⁶ In fact, let

$A = U_{A} [\begin{matrix} Σ_{A} & 0 \end{matrix}] V_{A}^{T}$

be a singular value decomposition of $A$ with both $U_{A}$ and $V_{A}$ an orthogonal matrix of dimensions M and L, respectively, and

$Σ_{A} = [\begin{matrix} Σ_{11} & 0 \\ 0 & Σ_{22} \end{matrix}] \in ℜ^{M \times M}$

the diagonal matrix with $Σ_{A} (n, n) \geq Σ_{A} (n + 1, n + 1) >$ $0, \forall n$ , where the sub-matrix $Σ_{11} \in ℜ^{N \times N}$ is assumed to be non-singular. The solution of equation (23) is given by

$Φ = U [\begin{matrix} Σ_{11} & 0 \end{matrix}] {[\begin{matrix} V_{11} & 0 \\ 0 & V_{22} \end{matrix}]}^{T} Σ_{A}^{- 1} U_{A}^{T}$ (26)

where $U \in ℜ^{N \times N}$ , $V_{11} \in ℜ^{N \times N}$ , and $V_{22} \in ℜ^{(M - N) \times (M - N)}$ are all arbitrary orthogonal matrices.

When $V_{11}$ and $V_{22}$ are set to the identity matrix, a particular $Φ$ and the corresponding dictionary $Ψ$ are given by

$Φ = U [\begin{matrix} I_{N} & 0 \end{matrix}] U_{A}^{T}, Ψ = U [\begin{matrix} Σ_{11} & 0 \end{matrix}] V_{A}^{T}$ (27)

With $Φ$ obtained, the ADLBP feature vector $y$ of a query face image can be compressed using equation (18), yielding a low dimension vector $x$ . The FR can be carried out with the obtained $x$ and dictionary $Ψ$ using one of the representation-based classification algorithms discussed in the “Related works and problem formulation” section. In the next subsection, we will derive an improved RSC with the nonnegative coefficient constraint embedded.

An improved RSC

Recall the RSC in the “Robust sparse coding” section. As mentioned before, the procedure used in such an algorithm for updating the weighting factors is time-consuming. This prevents the algorithm from real-time application.

In the proposed CS-RLNSRC FR system, the coefficient vector for the jth class is obtained from the following nonnegative coefficient constrained minimization

${\hat{s}}_{j} ≜ \arg min_{s} ∥ x - Ψ_{j} s ∥_{2}^{2} s . t . s \geq 0, \forall j$ (28)

which can be solved by the nonnegative least squares function lsqnonneg.m in MATLAB. The representation residual, as defined before, is given by

$ξ_{j} = x - Ψ {\hat{s}}_{j}, \forall j$

The classical hypothesis of noise interference follows the Gauss distribution. The actual distribution is not always the case. In human eyes, we usually focus on the most of face feature points that match, ignoring those mismatched. Based on this observation, we propose a much simplified procedure to update the weighting matrix $W$ . Instead of directly using ${| | ξ_{j} | |_{2}^{2}}$ in the classification, a weighted version is utilized.

Denoting $η_{j}$ as the $K_{0} th$ ascending ordered element of ${| ξ_{j} (n) |}$ , we define the weighted (diagonal) matrix for $ξ_{j}$

$W_{j} (n, n) = {\begin{matrix} 1, & | ξ_{j} (n) | \leq η_{j} \\ 0, & otherwise \end{matrix}, \forall j$ (29)

The query face image is then classified with

$\hat{j} ≜ \arg min_{j} {| | W_{j} ξ_{j} | |_{2}^{2}}$ (30)

We use Alg_RLNSRC to denote the classification algorithm specified in equations (28) to (30).

Remark 3.1

As seen from equation (29), the weighting matrix can be easily obtained without using an iterative procedure and statistical information of the residuals. Experiments show that such a weighting strategy is very effective for enhancing recognition performance.

We note that the choice of weighting factors given by equation (29) is intended to keep the $K_{0}$ smallest entries in the residual vector $ξ_{j}$ for classification. A more general choice for the weighting factors is

$W (n, n) = λ_{n}, s . t . \sum_{n = 1}^{N} λ_{n} = 1, λ_{n} \geq 0, \forall n$

Clearly, equation (29) is equivalent to the above for the case where $λ_{n} = 1 / K_{0}$ for the $K_{0}$ smallest entries in the residual vector $ξ_{j}$ , and $λ_{n} = 0$ otherwise.

In many situations, a query face image may not belong to any of the J classes. Therefore, it does not make much sense to classify this image. We use the relative residual index (RRI) to determine if the query image belongs to the J classes or not for the open-universe test

$RRI (\hat{j}) ≜ \frac{| | W_{\hat{j}} ξ_{\hat{j}} | |_{2}^{2}}{\sum_{j \neq \hat{j}} | | W_{j} ξ_{j} | |_{2}^{2}}$ (31)

Since $| | W_{\hat{j}} ξ_{\hat{j}} | |_{2}^{2}$ is the smallest among the J elements of ${| | W_{j} ξ_{j} | |_{2}^{2}}$ (see equation (30)), we have $0 \leq RRI (\hat{j}) < 1$ . Now, let $τ$ be the RRI threshold. The obtained $\hat{j}$ with equation (30) is accepted (i.e. to accept the query image belonging to the class) if and only if $RRI (\hat{j}) < τ$ .

Suppose that we have K query images and $K_{0}$ is the number of images that are accepted for classification. The recall rate is then defined as

$r_{τ} ≜ \frac{K_{0}}{K}$

Furthermore, let $k_{0}$ be the number of images among the $K_{0}$ accepted ones, which are correctly classified. To evaluate the actual performance of the recognition, we use the following index to represent the actual recognition rate

$γ_{τ} ≜ \frac{k_{0}}{K_{0}}$

Obviously, both $r_{τ}$ and $γ_{τ}$ are function of the RRI threshold $τ$ . Before turning to the “Experiment results” section, we make the following remarks.

Remark 3.2

Lack of training samples is an important factor affecting the performance of representation-based classification techniques. It has been noted that in face images, the facial features have symmetrical attributes. This means that if we have one face image, we can get another with its mirror face image, achieved using MATLAB command fliplr.m. By doing so, the size of training samples can be doubled, which is applied to the dictionary $Ψ$ . In our system, this is implemented.

Experiment results

In this section, we will examine the performance of the proposed CS-RLNSRC system for FR and compare it with some of the existing FR systems. It contains two portions. In the first one, several existing FR systems, which are all similar to our proposed one, using the representation-based classification are implemented for comparison, while in the second portion, three FR systems implemented with different strategies are used for the comparison.

Labeled faces in the wild (LFW) are used in our experiments. The LFW data set contains 13,323 web images of 5749 celebrities. A subset of the LFW data set, having $J = 158$ persons, is taken for experiments, in which $L_{j} = 10$ images with various poses of each person are used for training.

Portion I—RC-based FR systems

It should be pointed out that in this subsection, all the systems have the same structure, containing four modules among which the first three are the same as those in our proposed system. This means that one system differs from another just in the way how the classification is done. We will consider four representation-based classification algorithms: Alg_RLNSRC, defined in the “An improved RSC” section, denotes the classification algorithm used in our proposed CS-RLNSRC system, while Alg_SRC, Alg_LRC, and Alg_WL denote the classification algorithms reported in Wright et al.,²⁷ Naseem et al.,²⁸ and Mairal et al.,²⁹ respectively.

Six images of a person are displayed in Figure 2. These images are pre-processed for frontalization, and the resultant images $(90 \times 90)$ are shown in Figure 3, where the fifth one failed in being frontalized.

Figure 4 displays the evolution of the first frontalized image in Figure 3 from frontal face to ADLBP featured image which is $32 \times 32$ , leading to a ADLBP vector $y \in ℜ^{M \times 1}$ with $M = 1024$ .

Figure 4.

ADLBP extraction process: (a) frontalized image, (b) LBP, (c) DLBP, and (d) ADLBP.

Set $N = 200$ . With the training images and the approach to optimal sensing matrix design described in the “Dimension reduction” section, we can obtain a sensing matrix $Φ \in ℜ^{N \times M}$ that projects the ADLBP vector $y$ into $x = Φ y$ . Such kinds of signals/vectors are used for classification using the four algorithms: Alg_RLNSRC, Alg_SRC, Alg_LRC, and Alg_WL.

The effect of parameter $η$

We note that in designing the sensing matrix, one has to choose the parameter $η$ . Figure 5 shows the relationship between the recognition rate $r_{ec}$ and the parameter $η$ .

Figure 5.

The parameter $η$ versus the recognition rate $r_{ec}$ .

As it is shown in Figure 5 that the recognition rate can be increased with a proper choice of $η$ . It should be pointed out that the optimal $η$ is dependent on the training samples and has to be determined experimentally. In the sequel, $η$ is fixed to 0.16.

Effect of the number of training samples $L_{j}$

Table 1 displays the effect of $L_{j}$ on the recognition rate of the FR systems implemented using different representation-based classification algorithms.

Table 1.

The number $L_{j}$ versus recognition rate $r_{ec}$ (%).

$L_{j}$	5	7	9
$Al g_{WL}$	55.80	67.83	72.40
$Al g_{SRC}$	74.33	77.34	87.60
$Al g_{LRC}$	73.30	78.52	88.60
$Al g_{RLNSRC}$	78.61	87.94	92.25

The values in bold are the best one in each column of the table.

As displayed in Table 1, increasing the number of training samples can enhance the recognition rate. This is coincident with equation (21). The results also indicate that our proposed algorithm outperforms the others.

Effect of data compression ratio M/N

We fix $M = 1024$ and variate N. Figure 6 depicts the effect of N on the recognition rate $r_{ec}$ .

Figure 6.

Relationship between N and the recognition rate $r_{ec}$ .

One observes that for each algorithm, the recognition rate increases with N. This is expected and is coincident with the theoretical result given in equation (21). Once again, our proposed algorithm is the best among all the four algorithms.

Effect of features extraction

Let $Y_{0}$ denote the matrix representing a raw face image. As defined before, $Y$ is the corresponding ADLBP features. Here, we will examine how the feature selection affects the classification. Table 2 demonstrates how the recognition rate changes with the choice of features extraction for the four classification algorithms, where $Y_{LBP}$ and $Y_{DLBP}$ are the LBP and downsampling LBP features of $Y_{0}$ , respectively.

Table 2.

Feature selection versus recognition rate $r_{ec}$ (%).

Feature	$Y_{0}$	$Y_{LBP}$	$Y_{DLBP}$	$Y$
$Al g_{WL}$	50.29	53.7	70.56	87.24
$Al g_{SRC}$	58.89	60.87	76.63	90.60
$Al g_{LRC}$	59.04	61.01	77.48	91.40
$Al g_{RLNSRC}$	59.70	68.33	80.17	94.74

One can see that for each algorithm, the ADLBP feature is always better than any of the others in terms of recognition rate. The price paid for that is an increase of computation complexity.

Open-universe experiment

We add 8909 face images as distractors to the set of testing images, and then run the four FR systems. For a given $τ$ , we can calculate $r_{τ}$ and $γ_{τ}$ . The relationship between the two is depicted in Figure 7, where $N = 200$ is used.

Figure 7.

Relationship between the actual recognition rate $γ_{τ}$ and the recall rate $r_{τ}$ .

As seen, the proposed Alg_RLNSRC outperforms all the others and $γ_{τ}$ can reach $95 %$ with a recall rate $r_{τ} = 70 %$ . Experiments show that when N increases, $γ_{τ}$ augments for the same $r_{τ}$ . With $N = 400$ and $r_{τ} = 80 %$ , $γ_{τ} = 95 %$ (see Figure 7).

Portion II—comparison with differently structured systems

In the previous subsection, we compare our proposed FR systems with three systems that share the same structure as ours, and the main difference is that each system uses a different representation-based classification algorithm.

In this subsection, we will compare our proposed system with three other FR systems which use significantly different strategies from ours. Here, we use Alg_RLNSRC to denote our proposed system; Alg_G the system that utilizes an efficient classifier for large-scale images;³⁰Alg_Hassner the system that uses a hybrid algorithm which achieved the highest score on the LFW challenge in the Image-Restricted, Label-Free Outside Data category;¹³ and Alg_Chen the system that adopts the learned high-dimensional features for the classification.³¹

As it is indicated in Table 3, the 3D-based frontalization algorithm, used in Alg_RLNSRC and Alg_Hassner, can improve the recognition performance greatly, compared with Alg_G. A comparable result is achieved by Alg_Chen. Our proposed algorithm surprisingly outperforms Alg_Chen on the database used in the experiments.

Table 3.

Recognition rate $r_{ec}$ (%) versus FR systems.

System	$r_{ec}$
Alg _G	84.13
Alg _Hassner	91.65
Alg _Chen	93.18
Alg _RLNSRC	94.74

FR: face recognition.

Conclusion

In this article, a novel framework has been proposed, in which a 3D-based frontalization strategy is adopted as a pre-process and the ADLBP features of the frontalized images are employed for recognition. In addition, an optimized projection matrix is designed to reduce the implementation complexity and an improved RSC algorithm has been derived for classification using the lower dimensional measurements. Experimental results on open and closed universe of LFW data set demonstrate the effectiveness of the proposed approaches and show that our proposed system outperforms those RC-based FR systems as well as the three FR systems that have a significantly different structure from ours.

In order to make the proposed FR system be used in a real-time application, more efficient algorithm should be developed for the ADLBP feature extraction. The key to the success of the proposed FR system is to prevent overfitting problem by applying nonnegative coefficient constraint. Similar phenomenon has also been noted in DL.^32,33 DL has become a powerful tool and been used in many areas. How to embed our approach into the DL context will be another direction for future investigation.

Footnotes

The authors would like to thank the reviewers for the detailed review,comments,and suggestions,which help the authors improve the quality of this paper significantly.

Handling Editor: Hiram Ponce

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by Science & Technology Projection of Zhejiang province (STPZP) 2017C33119;STPZP Education Department Y201430687.

ORCID iD

Aihua Yu

References

Tolba

El-Baz

El-Harby

AA.

Face recognition: a literature review. Int J Sig Proces 2006; 2(1): 88–103.

Turk

Pentland

Eigenfaces for recognition. J Cogn Neurosci 1991; 3(1): 71–86.

Wang

Tang

et al . Atomic representation-based classification: theory, algorithm, and applications. IEEE T Pattern Anal 2019; 41(1): 6–19.

Yan

et al . Face recognition using Laplacianfaces. IEEE T Pattern Anal 2005; 27: 328–340.

Jaiswal

Evaluation of face recognition methods. J Glob Res Comput Sci 2011; 2(7): 478–482.

Bioucas-Dias

Figueiredo

MAT

. A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE T Image Process 2007; 16(12): 2992–2996.

Wang

Zhou

Zhang

et al . The non-convex sparse problem with nonnegative constraint for signal reconstruction. J Optimiz Theory App 2016; 170(3): 1009–1025.

Cui

Chang

Shan

et al . Joint sparse representation for video-based face recognition. Neurocomputing 2014; 135(8): 306–312.

Krizhevsky

Sutskever

Hinton

GE.

ImageNet classification with deep convolutional neural networks. In: Proceedings of the international conference on neural information processing systems, Tahoe, NV, 3–6 December 2012, pp.1097–1105. New York: ACM.

10.

Wang

Soong

et al . A sparse and low-rank approach to efficient face alignment for photo-real talking head synthesis. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, Prague, 22–27 May 2011, pp.1397–1400. New York: IEEE.

11.

Liao

Jain

SZ.

A fast and accurate unconstrained face detector. IEEE Trans Pattern Anal Mach Intell 2016; 38(2): 211–223.

12.

Wagner

Wright

Ganesh

et al . Toward a practical face recognition system: robust alignment and illumination by sparse representation. IEEE T Pattern Anal 2012; 34(2): 372–386.

13.

Hassner

Harel

Paz

et al . Effective face frontalization in unconstrained images. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, 7–12 June 2015, pp.4295–4304. New York: IEEE.

14.

Gao

Tian

CN.

Multi-view face recognition based on tensor subspace analysis and view manifold modeling. Neurocomputing 2009; 72(16): 3742–3750.

15.

Yang

et al . Integrate the original face image and its mirror image for face recognition. Neurocomputing 2014; 131(7): 191–199.

16.

Yang

Zhang

Yang

et al . Robust sparse coding for face recognition. In: Proceedings of the computer vision and pattern recognition (CVPR), Colorado Springs, CO, 20–25 June 2011, pp.625–632. New York: IEEE.

17.

Candés

Wakin

MB.

An introduction to compressive sampling. IEEE Signal Proc Mag 2008; 25(2): 21–30.

18.

Donoho

DL.

Compressed sensing. IEEE T Inform Theory 2006; 52(4): 1289–1306.

19.

Foucart

Koslicki

Sparse recovery by means of nonnegative least squares. IEEE Signal Proc Let 2014; 21(4): 498–502.

20.

Lawson

Hanson

RJ.

Solving least squares problems. Englewood Cliffs, NJ: Prentice Hall, 1974.

21.

Ojala

Pietikainen

Maenpaa

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE T Pattern Anal 2002; 24(7): 971–987.

22.

Bouwmans

Zahzah

EH.

Robust PCA via principal component pursuit: a review for a comparative evaluation in video surveillance. Comput Vis Image Und 2014; 122(4): 22–34.

23.

Haupt

Rui

Nowak

et al . Compressive sampling for signal classification. In: Proceedings of the 2006 fortieth Asilomar conference on signals, systems and computers, Pacific Grove, CA, 29 October–1 November 2006, pp.1430–1434. New York: IEEE.

24.

Elad

Optimized projections for compressed sensing. IEEE T Signal Proces 2007; 55(12): 5695–5702.

25.

Zhu

et al . On joint optimization of sensing matrix and sparsifying dictionary for robust compressed sensing systems. Digit Signal Process 2018; 73: 62–71.

26.

Bai

Sun

et al . Face recognition based on optimized projections for distributed intelligent monitoring systems. Int J Distrib Sens N 2016; 2016(1): 1–13.

27.

Wright

Yang

Ganesh

et al . Robust face recognition via sparse representation. IEEE T Pattern Anal 2009; 31(2): 210–227.

28.

Naseem

Togneri

Bennamoun

Linear regression for face recognition. IEEE T Pattern Anal 2010; 32(11): 2106–2112.

29.

Mairal

Bach

Ponce

et al . Task-driven dictionary learning. IEEE T Pattern Anal 2012; 34(4): 791–804.

30.

Ortiz

Becker

BC.

Face recognition for web-scale datasets. Comput Vis Image Und 2014; 118(1): 153–170.

31.

Chen

Cao

Wen

et al . Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition, Portland, OR, 23–28 June 2013, pp.3025–3032. New York: IEEE.

32.

Bengio

Courville

Vincent

Representation learning: a review and new perspectives. IEEE T Pattern Anal 2012; 35(8): 1798–1828.

33.

Ranzato

Boureau

Lecun

. Sparse feature learning for deep belief networks. In: Proceedings of the 20th international conference on neural information processing systems, Vancouver, BC, Canada, 3–6 December 2007, pp.1185–1192. New York: IEEE.

A novel framework for face recognition using robust local representation–based classification

Abstract

Keywords

Introduction

Related works and problem formulation

Linear regression classification

Sparse representation–based classification

Robust sparse coding

Nonnegative coefficient sparse representation–based classification

Problem formulation

A novel framework for FR

Image pre-process

Feature extraction

Dimension reduction

Theorem 1

An improved RSC

Remark 3.1

Remark 3.2

Experiment results

Portion I—RC-based FR systems

The effect of parameter η

Effect of the number of training samples L j

Effect of data compression ratio M/N

Effect of features extraction

Open-universe experiment

Portion II—comparison with differently structured systems

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References

The effect of parameter $η$

Effect of the number of training samples $L_{j}$