Sage Journals: Discover world-class research

Abstract

To avoid the requirement of expert knowledge in conventional methods for car styling analysis, this article proposes a machine learning–based method which requires no expert-engineered features for car frontal styling analysis. In this article, we aim to identify the group behaviors in car styling such as the degree of brand styling consistency among different automakers and car styling patterns. The brand styling consistency is considered as a group behavior in this article and is formulated as a brand classification problem. This classification problem is then solved by a machine learning method based on the PCANet for automatic feature encoding and the support vector machine for feature-based classification. The brand styling consistency can thus be measured based on the classification accuracy. To perform the analysis, a car frontal styling database with 23 brands is first built. To present discovered brand styling patterns in classification, a decoding method is proposed to map salient features for brand classification to original images for revelation of salient styling regions. To provide a direct perception in brand styling characteristics, frontal styling representatives of several brands are present as well. This study contributes to efficient identification of brand styling consistency and visualization of brand styling patterns without relying on expert experience.

Keywords

Machine learning styling analysis car brand styling styling consistency classification

Introduction

Studies on car styling have received much research attention recently due to its significance in presenting the brand identity¹ and its influence to customers’ decisions.² Most of previous works focus on analysis of the stylistic features such as lines and shapes^3,4 extracted by experts. By calculating pre-defined geometric quantities (e.g. length and curvature) of the stylistic features, they can be related to certain semantics. While these methods are effectively applied to certain tasks within a small dataset they suffer from the following two limitations. On one hand, while the adopted pre-defined features are specifically defined for certain analysis tasks, they may fall short in others, making these methods not readily applicable to various analysis tasks. On the other hand, definition of these pre-defined features would require much expert knowledge and identification of them much human labor, not generalizable to large-scale data.

Due to the two limitations stated above, most previous works aim to analyze styling design of individual cars or of a small group of examples. However, we argue that there are some tasks that shall also be investigated within a much larger dataset, for instance, the brand consistency of car styling. This can be justified by the fact that the brand consistency is a collective response of cars among various brands. It would be prohibitive to apply the conventional methodology to this inquiry.

To address this difficulty and to discuss the brand styling consistency as a group behavior of cars, we propose a machine learning–learning based method which requires no expert-engineered features. We collect a large set of car frontal view images and aim to measure the brand consistency and discover the shared brand patterns via the machine learning method. In particular, we cast the brand styling consistency problem as a brand classification problem which is well suited to be solved by the machine learning methodology. By evaluating the classification accuracy in the prediction stage we measure the degree of the styling consistency of a car brand. Meanwhile, the brand styling patterns can be revealed through the classification task with the proposed decoding technique.

The proposed machine learning method is composed of three stages: feature encoding, brand classification, and an additional decoding technique for visualizing the brand patterns. First, we adopt the PCANet⁵ as an encoding mechanism for automatically extracting styling features. While deep learning neural networks are widely applied to diverse tasks, the PCANet, shallow network architecture, is chosen partly because it does not require large-scale dataset compared to deep learning architectures and partly due to its simplicity. Second, the brand classification is performed with the feature encoded by the PCANet and a linear multi-class support vector machine (SVM).⁶ The trained classifier for each brand turns out to be a vector in the feature space; therefore, we treat the classifier for each brand as the brand pattern. Third, we further provide a decoding technique to map the high-dimensional feature back to the image domain, revealing the salient stylistic features of the brands.

In the result and discussion, we first present the collected car frontal view image dataset on which the classification task is performed. The styling consistency across the collected brands is investigated and several brands are discussed to support our methodology. Then, the decoding technique is applied to reveal and visualize the shared brand patterns. From the results, we found that these so-called styling patterns of several brands discovered by our machine learning method share great similarity with the public impression on these brands. Finally, to provide useful references with designers, frontal styling representatives of several brands extracted with our method are present as well. To support machine learning methods, one needs to provide sufficient amount of data for training the model. In this article, a large-scale car frontal styling dataset is built based on 23 popular brands in China.

The rest of the article is structured as follows. The research background is reviewed in section “Related works,” followed by the introduction of proposed machine learning method. Section “Experiments and discussion” presents experimental results with discussion in detail, and conclusions are made in section “Conclusion.”

Related works

Experience dependent car styling analysis

Most works in car styling analysis rely on preprocessing of human defining styling features. McCormack et al.¹ analyzed Buick’s brand identity through a shape grammar scheme based on feature lines extracted from Buick’s styling. Based on the extracted car-side silhouette, Hyun et al.^4,7 analyzed the similarity between car styles among car brands. Hsiao⁸ built a consultative program for the design process. It employed adjectival image words to represent feelings of a customer toward a car, thereby building a connection between the car styles and image words. Tian et al.⁹ proposed an integrated analytic hierarchy process–technique for order preference by similarity to ideal solution (AHP-TOPSIS) method for the automotive style design assessment, and they interviewed 20 experts for the performance factors corresponding to styling elements.

Despite the remarkable successes of pre-defined features have achieved in car styling analysis, these methods depend heavily on human abilities in perception, abstraction, and extraction of styling features. Therefore, these studies have a high requirement for expert knowledge and intensive labor, which is not generalizable to large-scale data.

Car brand styling

Styles could be classified not only by groups, schools, regions, and periods in the art¹⁰ but also by brands as a vehicle representing brand culture and characteristics. Person et al.¹¹ discussed influence factors in decision-making of product styling in detail, and similar styling is more acceptable from branding prospective. Karjalainen¹² suggested analytical methods for the grasp of visual brand recognition based on several car brands. Ranscombe et al.³ proposed a decomposition method to investigate the influence of esthetic features on brand recognition of vehicles, experimental results validate that esthetic features in the front view have the greatest influence on consumers’ perception of brand, which is also demonstrated in Burnap et al.¹³ and Hyun et al.¹⁴ Abidin et al.¹⁵ introduced styling DNA in the perception of the brand image by identifying character traits. With extracted feature shapes from the appearance of products, Ranscombe et al.¹⁶ proposed calculations of geometry properties to conduct similarity analysis.

The researches above illustrated the importance of styling in terms of a brand. However, most studies analyze brand styling through feature lines based methods, which still require much prior knowledge. To make the car styling analysis more objective and intelligent, we propose a complementary data-driven styling analysis approach, which analyzes car brand styling by applying machine learning methodology.

Style recognition through machine learning

With the surge of machine learning methods, researchers have applied these methods to the stylistic analysis or recognition of paintings, sketches, and geography.^17–21 Lee et al.²² proposed a visual data mining approach that discovers connections between recurring midlevel visual elements in historical and geographic image collections. Chang and Chen²³ established a relationship between car profile characteristics and consumers’ image perception using a back-propagation neural network. Similar work can be found in Yumer et al.²⁴ as well, these studies demonstrate a remarkable ability of machine learning in pattern discovery. In our work, we take the advantage of these methods in car recognition^25–29 to classify styling in terms of brands, and further measure the visual consistency and interpret discovered patterns.

Recently, Pan et al.³⁰ proposed a deep learning method for identifying salient regions of design attributes in automobile styling. In particular, a deconvolution mechanism is applied to the visualization of regions for predicted attributes, which is similar to our pattern visualization. Although they studied the car styling in a collective manner, patterns behind the group behavior and the measurement of intensity in styling identity are not fully discussed, in part, because of the convolutional neural network (CNN) methods they employed. In our work, we adopt a simple and effective method to explicitly present and interpret discovered patterns in car brand styling.

A machine learning–based car brand styling analysis

In this section, a machine learning–based method is proposed to analyze the brand styling in car frontal face design and to discover any holistic feature patterns characterizing the brands. The brand styling analysis problem is formulated as a classification task which is suitable for the application of the machine learning method. This formulation is justified by the fact that the brand styling can be seen as a collective behavior of a cluster of cars from a certain brand. In this way, the degree of the styling consistency of a brand can be defined as the classification accuracy among the dataset.

Three components are mainly contributes to the brand styling analysis framework: feature encoding, brand classification with the extracted features, and the decoding technique for holistic feature visualization. In what follows, the PCANet adopted to extract the holistic features by encoding the images into high-dimensional feature space is introduced in the first place. It is followed by the description of the SVM classifier and the training process for recognizing the brand of each input image. The measure of the brand consistency is then present as the classification accuracy based on the given dataset. Finally, the decoding technique devised for visualization of the holistic features patterns trained through the classification task is demonstrated.

Feature encoding with the PCANet

The PCANet is a network architecture which consists of multiple image processing techniques to encode each given image in the dataset into a high-dimensional vectorized representation. The space spanned by these vectorized codes is termed feature space, and each of the vectorized codes is called a feature vector. Thus, one can see the PCANet as a non-linear mapping from the image space to the feature space, both encoding the necessary information for recognizing the attributes, such as the brand in our task, associated with a specific data sample.

The PCANet contains three stages: two consecutive stages of cascaded principal component analysis (PCA), and the last output stage of a combination of binary hashing and block-wise histogram. This hierarchical architecture and the patch process ensure a rich and detailed feature representation. Figure 1 shows how the PCANet encodes an image of car front face sample to a feature vector. The details of the PCANet are introduced below.

Figure 1.

Pipeline of the PCANet encoding process.

Given N input grayscale images ${I_{i}}_{i = 1}^{N}$ of equal size $w \times h$ , the PCANet processes each of given sample image $I_{i}$ as follow. In the first stage of the PCA filtering, the PCANet starts by taking a patch P of size $k_{1} \times k_{2}$ centered at every pixel. Then, the intensity at each pixel in patch P is subtracted by the mean intensity of the patch. The rectangular patch is then vectorized into a column with a reversible transformation $M : R^{k_{1} \times k_{2}} \to R^{k_{1} k_{2}}$ , which means that columns in patch P are concatenated to a long column vector $v$ , $v = M (P)$ . These column vectors computed at patches centered at all pixels of the image then form a matrix ${\bar{X}}_{i}$ . Thus, the matrix ${\bar{X}}_{i}$ has the size of $(k_{1} k_{2}) \times c$ , where $c = (w - k_{1} + 1) (h - k_{2} + 1)$ . Thus, by applying the same process to all input images and grouping the resultant matrices ${\bar{X}}_{i}$ as follow, we have

$X = [{\bar{X}}_{1}, {\bar{X}}_{2}, \dots, {\bar{X}}_{N}] \in R^{k_{1} k_{2} \times Nc}$ (1)

The covariance matrix $X_{cov}$ of $X$ is further computed. It is with the covariance matrix $X_{cov}$ that the first PCA filtering is performed. Specifically, we first compute the eigen decomposition of $X_{cov}$ . We then rank the eigenvectors with respect to their corresponding eigenvalues in a descending order and obtain the first $L_{1}$ eigenvectors.

Each eigenvector is a column vector in $R^{k_{1} k_{2}}$ and thus can be reversely transformed to a patch of size $k_{1} \times k_{2}$ using the inverse map of $M$ . Thus, we have a $k_{1} \times k_{2}$ kernel $W_{l}^{1}$ , where the subscript indicates its origin of the lth eigenvector and the superscript the PCA filtering stage.

Applying $W_{l}^{1}$ as convolution kernel to the image $I_{i}$ padded with zeros at the boundary, we have the output of the first PCA filtering stage

$I_{i}^{l} \dot{=} I_{i} * W_{l}^{1}, i = 1, 2, \dots, N; l = 1, 2, \dots, L_{1}$ (2)

where * means image convolution operation.

This leads to $L_{1}$ filtered images for each input image $I_{i}$ , as shown in the second column of Figure 1. The PCA filtering can be considered as a weighting mechanism to emphasize the major contributors to the $L_{1}$ largest eigenvectors. By consecutively weighting (or filtering) specific locations in the original image, one expects to extract salient features.

The second stage of the PCANet is to apply the PCA filtering to each filtered image output by the first stage, thus the output of second stage is obtained as follows

$O_{i}^{l, ℓ} \dot{=} I_{i}^{l} * W_{ℓ}^{2} . ℓ = 1, 2, \dots, L_{2}$ (3)

where $W_{ℓ}^{2}$ is the second-stage convolutional kernel corresponding to the $ℓ th$ eigenvector obtained by performing PCA to the covariance matrix of collection ${I_{i}^{l}}$ (i = 1 ∼ N, l = 1 ∼ L₁) and $L_{2}$ denotes the number of eigenvectors used in the second stage. This will leads to $L_{1} L_{2}$ output images of second stage for each input. These correspond to the third column of the Figure 1, where an array of output images is listed.

The third stage accounts for the binarization and merging of outputs by the two consecutive the PCA filtering stages. The binarization step $H (\cdot)$ converts the filtered images to binary images where 1 at a pixel with positive value in the filtered image and 0 otherwise. The merging step is a weighted summation of the $L_{2}$ secondary filtered images of each primary filtered image as follows

$T_{i}^{l} = \sum_{ℓ = 1}^{L_{2}} 2^{ℓ - 1} H (O_{i}^{l, ℓ})$ (4)

In this way, each primary filtered image corresponds to a weighted sum. Again, this process aims to emphasize the important regions founded by eigenvectors according to their rank. To perform classification, the $L_{1}$ resultant images are to be vectorized and concatenated to a long vector. To this end, each resultant image is decomposed into $B$ equal-sized blocks (of size $BH \times BW$ , possibly overlapped) in the first place.

The block size is closely related to the “resolution” of the final feature vector; the larger the block size, the lower resolution of the feature vector. This is relevant to the decoding process where the final output feature vector is mapped back to the blocks of an image, demonstrating the saliency of each block with respect to the brand classification and thus revealing the important pattern of the related brand. This will be discussed later in section “Styling pattern visualization via feature decoding.”

Then, a histogram (with $2^{L_{2}}$ bins) is computed in each block, and a vector, denoted as $Bhist (T_{i}^{l})$ , is obtained by concatenating $B$ histograms. Finally, the PCANet output feature vector of every image in $I_{i}$ is obtained after concatenating all the $Bhist (T_{i}^{l})$ of $L_{1}$ images, which is denoted as

$f_{i} = {[Bhist (T_{i}^{l}), \dots, Bhist (T_{i}^{L_{1}})]}^{T} \in R^{(2^{L_{2}}) L_{1} B}$ (5)

These high-dimensional feature vectors extracted by the PCANet from the input dataset spans the so-called feature space in which the classification is then performed.

Brand classification with SVM

The classifier employed here is the SVM which is initially proposed by Cortes and Vapnik³¹ to solve two-group classification tasks and later generalized to multi-classification problems. Many efficient libraries have been proposed, such as LIBSVM³² and LIBLINEAR.⁶ The main idea of the SVM is to find the optimal hyperplane with the normal vector $w$ in the (high-dimensional) feature space that maximizes margin between two classes of data. The basic equation for two-class classification is defined below

$d_{i} \cdot (w^{T} f_{i} + b) \geq 1, i = 1, 2, \dots, N$ (6)

where $f_{i}$ is the feature vector of a training sample in the dataset, $d_{i} = {- 1, 1}$ the corresponding ground-truth label of $f_{i}$ , and $w, b$ denote two parameters to be trained for classification. Empirically, $w$ can be considered as a sequence of weight parameters representing the importance of elements in the feature vector of a sample. Therefore, we call $w$ as the pattern vector, since this pattern vector trained for determining the category of a sample encodes certain categorical information, or patterns, as will be shown later.

For multi-class classification tasks where a training dataset with m types of labels is given, a classifier is trained whose parameters $w_{j} (j = 1, 2, \dots, m)$ and $b$ are learned. Then, given a feature vector $f$ of a testing sample encoded by the PCANet, its predicted category T is defined as follows

$T = \arg max_{j} (w_{j}^{T} f + b), j = 1, 2, \dots, m$ (7)

Since the entries in $f$ are non-negative, it can be concluded from equation (7) that the positive entries in the pattern vector $w_{j}$ are of significance in classification. Therefore, pattern analysis can be achieved through decoding $w_{j}$ to reveal the importance of elements. This could be conducted by reversing the last stage in the PCANet.

According to equation (7), the prediction value $(w_{j}^{T} f_{i} + b)$ can be viewed as a measurement of a sample with the feature vector $f_{i}$ belonging to a certain category. The larger the prediction value with respect to the pattern vector $w_{j}$ of jth category, the closer the sample to the category’s pattern. In this way, the set of K samples in category A that maximize the prediction values are utilized to represent the styling of category A.

Measuring styling consistency of a brand

An intuitive idea of the degree of brand styling consistency is that a brand has higher styling consistency if more car models of the brand can be correctly identified based on their styling and discovered brand styling pattern. To mathematically define a metric based on this primitive idea, we use the brand recognition rate as the metric for evaluating the brand styling consistency. A widely used implementation in classification task is to split the dataset into two subsets, one for training and the other for classification testing. However, our goal is not to provide a generic classifier for future classification tasks; instead, we would like to use the classifier as a tool for measuring the degree of styling consistency of a brand within the given data collection. In this way, we adopt the leave-one-out scheme.

Given a car image $I_{i} (i \in {1, 2, \dots, N})$ to be classified, the rest images of ${I_{j}} (j = {1, 2, \dots, N, j \neq i})$ in the dataset is used to train a model-specific classifier $C_{i}$ , which is then employed to recognize I_i’s brand. This process is repeated by iterating $i$ from 1 to N. The recognition accuracy of a certain brand is considered as the measure of the styling consistency of this brand and is computed as follows

$C I_{A} = N_{A}^{r} / N_{A}$ (8)

where $C I_{A}$ is the degree of styling consistency of brand A, $N_{A}^{r}$ the number of correctly recognized models in brand A, and $N_{A}$ the number of all models in brand A.

While the styling consistency of a brand is evaluated based on the correct recognitions, we can investigate similarity among brands by evaluating the false recognitions. A false recognized car model of a brand indicates that its styling is similar to the styling pattern of the recognized brand according to our framework and thus the amount of false recognized models from one brand to another reflects the similarity of these two brands in styling. Accordingly, a visualization scheme is employed to present the styling similarity among brands. Figure 2 shows a diagram of brand styling similarity. Region A refers to a collection of samples recognized correctly as brand A. Region BA are samples belonging to brand B but falsely recognized as brand A. The other regions in Figure 2 are interpreted likewise.

Figure 2.

Diagram for visualization of brands’ styling similarity.

Styling pattern visualization via feature decoding

By leveraging the fact that a pattern vector is also in the feature space, decoding can be achieved by inversely processing of feature encoding. Recalling the histogram computation of blocks in output stage of the PCANet, any vector in the feature space can be restored to blocks in the image domain. Equation (7) implies that the positive entries in the pattern vector are of significance in classification and thus can lead to essential information about brands. The combination of positive entries and their corresponding blocks therefore indicates salient locations in an image that define the category of this image during the classification task. Following this rationale, we thus visualize the salient area of a brand in an image and further consider the salient region as styling pattern of the brand.

As mentioned above, the block size is relevant to the visualization of brand styling pattern. Suppose the size of a block is set to be $BH$ and $BW$ in height and width, respectively, and the blocks are disjoint to each other. Then, given an input sample with a height of $IH$ and a width of $IW$ , the dimension $D_{w}$ of the pattern vector $w$ is defined as follows

$D_{w} = L_{1} \times B \times N_{C}$ (9)

where $L_{1}$ is the number of eigenvectors chosen in the first stage of the PCANet; $B$ denotes the number of blocks in the last stage of the PCANet and is $(IH / BH) \times (IW / BW)$ due to the zero overlapping; and $N_{C}$ is the number of color levels in grayscale ranging from 0 to 255 and thus equals 256. As stated before, the block size is related to the “resolution” of the final feature vector. To trade-off between a high resolution and a low dimension of final feature vector, we set the block of size $4 \times 4$ with zero overlapping.

The proposed scheme transforming a pattern vector into an image is shown in Figure 3. Specifically, based on the process in the output stage of the PCANet, a pattern vector is first divided into $L_{1}$ parts corresponding to the concatenation of $L_{1}$ resultant images, and then, each part can be divided into $B$ sets, which are further placed to recover an image referring the block process. $N_{C}$ values in each set are finally summed to represent the value of each pixel in the block of the recovered image after mapped into 0–255.

Figure 3.

Process of decoding a portion of pattern vector into the salient image. The pattern vector is first divided into $L_{1}$ part; (1) denotes the expansion of first part and (2) represents the summarization and rearrangement of a part into an image.

Experiments and discussion

In this section, implementation details of the proposed method is described in the first place, including the introduction of a car frontal styling database (CFSDB) and parameters chosen for training the PCANet/SVM framework. Then, experimental results are presented and discussed in three aspects: brand styling consistency, visualization of brand styling pattern, and representative brand styling.

CFSDB

To support the style-based car brand classification, a car styling database is required. In fact, there are some benchmark car datasets^33,34 in computer vision community, in which cars are usually captured through traffic cameras or taken under various viewpoints. However, styling features may not be holistically perceived and precisely extracted due to the lighting, occlusions, and various viewing angles in these mega scale databases. Recalling our task of car styling studies within and across brands, a specific car styling database is needed. As stated in related works, frontal view of a car mostly reveals the brand’s style and has more impact on the vision perception of branding characteristics. Thus, we build the CFSDB to facilitate the car styling analysis research.

Since styling elements in car frontal view are mainly focused on headlights, grille, and bumper,³ we select the union of these details as the analyzed styling region, that is, styling region(s) of interest (ROI). Because of different aspect ratios of styling ROI in sedan and SUV, we build two datasets accordingly. Figure 4 shows part of car styling samples of Audi and Volkswagen in two datasets. Both datasets are well aligned in a front view. The sedan dataset contains 4726 images corresponding to 22 brands, and the SUV dataset contains 2441 images corresponding to 21 brands. These car brands are selected considering both sales and wide acceptance of high degree of brand styling; the selected 23 brands and their numbers are shown in Figure 5.

Figure 4.

(a) Audi samples in the sedan dataset, (b) Volkswagen samples in the sedan dataset, (c) Audi samples in the SUV dataset, and (d) Volkswagen samples in the SUV dataset.

Figure 5.

The numbers of car models in brands: (a) sedan dataset and (b) SUV dataset.

Parameter settings of PCANet/SVM

Before exploring styling patterns, we first define a proper size of image to balance the computational performance and efficiency. The cross-validation experiments is conducted based on 2200 samples randomly chosen from 22 sedan brands and 1260 samples from 21 SUV brands. The statistics of experiments are shown in Figure 6, and the proper size of sedan ROI and SUV ROI is 48 × 120 and 52 × 112 pixels, respectively.

Figure 6.

ROI size optimization: (a) sedan and (b) SUV.

Then, an experiment is conducted to find proper parameters of the PCANet, in which four parameters are considered: patch size, number of filters, block size, and block overlap ratio. The optimal results are shown in Table 1, which are also decided based on accuracy and computation cost. For the implementation of SVM, we adopted LIBLINEAR for its multi-classification strategy of one-versus-rest, in which the solver L₂-regularized L₂-loss support vector classification (dual)³⁵ is applied, leaving others as default setting.

Table 1.

Parameters in the PCANet.

Patch size	Number of eigenvectors ( $L_{1}$ , $L_{2}$ )	Block size	Block overlap ratio
$5 \times 5$	8, 8	$10 \times 10$	70%

With proper parameters, a leave-one-out cross-validation is first conducted to get the distribution of accuracies of all brands. Then, a classification of complete dataset is conducted to obtain brand styling patterns, in which two parameters of block size and block overlap ratio are adjusted as illustrated in the pattern decoding. It should be noted that this minor revision barely changes the classification performance but causes a simpler manner in visualization.

Styling consistency analysis

After the completion of leave-one-out classification experiment, we first visualize the distribution of models in sedan dataset based on their PCANet features and classification results. Figure 7 shows the whole distribution of classification results generated by t-distributed stochastic neighbor embedding (t-SNE),³⁶ with perplexity of 30, learning rate of 500, and the number of iterations as 1000. We can see that the models belonging to Audi, Benz, and Volkswagen are clustered and distant from others, reflecting their high degree of brand consistency and distinctive brand styling, while the models mixed together illustrate similarity among these corresponding brands.

Figure 7.

Visualization of classification results.

The detailed accuracies of sedan brands are shown in Figure 8. We can see that European brands like Audi, BMW, and Volkswagen achieve high recognition rates, which comes to the conclusion that these brands possess higher degree of styling consistency. For the Asian brands, such as Toyota, Hyundai, and Geely, the relative low recognition rates illustrate more car models within these brands deviate from the discovered identities and reflect a higher degree of styling diversity. More specifically, BMW and Audi achieve the highest accuracies with 100.00% and 99.73%, followed by Volvo, Skoda, and Volkswagen with 99.56%, 99.43%, and 99.41%, respectively, illustrating their high brand styling consistency. This accords with the cognition in the market and other researches.^7,15 While brands like Toyota, Hyundai, Nissan, Geely, and Chery, whose brand accuracies are relatively lower than others, are considered as lacking of the styling consistency in their models, at least from the frontal view. This may reflect the diversity of car styling within these brands.

Figure 8.

Accuracies of sedan brands.

To further verify the styling consistency or diversity of brands based on the classification results, we present a styling similarity visualization method to show the false recognized car models in a regional distribution manner.

Figure 9 shows the false recognized samples related to Audi, Volkswagen, and Chery, respectively. As shown in Figure 9, few models in Audi and Volkswagen are recognized into other brands, whereas much models in Chery are falsely recognized other brands such as Geely, Volkswagen, and Hyundai. Apart from the same conclusions drew above that Audi and Volkswagen possess higher degree of styling consistency whereas Chery with a styling diversity, the styling similarity can be preliminary judged through the number of cross samples false recognized. Briefly, more samples in the boundary region between two brands indicate a higher degree of styling similarity in the corresponding brands. For example, a higher similarity is observed between Geely and Volkswagen for more car models deviating from the patterns of their brands and falling into the other side. It should be noted that this visualization can only provide a preliminary conclusion in similarity, and a validation is needed for the further discussion, such as the validation through the user study.

Figure 9.

Styling similarity analysis: (a) Audi, (b) Volkswagen, and (c) Chery.

With the observations above, we can further check the incorrectly recognized car models to analyze the similar styling models among brands as shown in Figure 9(a). From Figure 9(a), we can see that only one model in Audi are recognized as Kia and that the false recognized model is produced approximately in 2004, which is not designed with the “wide-mouth” grille at that time. Apparently, the styling of this model falls away from the styling pattern of Audi and is falsely recognized into Kia, to which it is more likely to belong. While for the car models recognized as Audi from other brands, two of them possess the “wide-mouth” grille similar to Audi’s styling. Thus far, we can only guess that the styling pattern discovered in Audi may be the “wide-mouth” grille, because there are two models recognized as Audi, yet with no apparent styling similar to Audi’s pattern from the perception of human. Therefore, there is a need to present what the pattern of every brand is to make further explanation, which is discussed in the following section.

Visualization of brand styling pattern

According to the decoding process designed for brand gene’s visualization, we now visualize brand patterns extracted in the classification task. The brand genes are transformed to saliency maps with warmer/colder colors for higher/lower saliency. Each saliency map is overlaid on the representative car image of the corresponding brand. Figure 10 shows the detailed visualization process of Audi’s styling genes, in which the highlight area in the recovered images represents the key styling region for the brand. Figure 11 shows the visualization results of Audi, BMW, Lexus, and Chery.

Figure 10.

Salient images from the styling pattern of Audi: (1) the dividing process of pattern vector, (2) the summarization, and (3) the rearrangement from a feature vector to a salient image in which salient regions are in red.

Figure 11.

Salient images from brand styling patterns: (a) Audi, (b) BMW, (c) Lexus, and (d) Chery. Warmer color indicates a high degree of saliency.

From Figure 11(a), we can see that the salient styling region for Audi is mainly distributed around the grille and logo area. This indicates that the grille is the most distinctive styling features for Audi besides the logo. However, to be more specific, the brighter areas are located in the upper and side regions of grille. This is a cognition deviation from the complete shape of “wide-mouth” grille of Audi in the market. Nevertheless, we consider this result valuable for the interpretation ability in locating more details instead of relying on the subjective perception of human. It should also be declared that the perception of human plays a significant role in the styling assessment, such as the abstraction ability.

In Figure 11(b), it obviously states that the center region of the grille plays the most important role in the recognition of BMW from other brands. Although there is a cognitive deviation from the “kidney grille,” we can intuitively observe that it is the separation in center that forms BMW’s double “kidney grille.” Therefore, the discovered pattern in BMW can be regarded as learning more detailed features in styling that can be used for the interpretation of the semantic shapes perceived by human. Similarly, Lexus’ pattern shown in Figure 11(c) can explain the distinctive spindle grille except the highlighted logo.

However, as shown in Figure 11(d), we notice that nearly the whole region in Chery’s frontal styling is highlighted with several slightly darker regions scattering around the logo. This may state that Chery has a variety of styling types in its products and lacks a unique and unified brand styling; therefore, the discovered pattern can only depend on the holistic styling to achieve the brand classification.

From the above results, we can see that logos in some brands affect the brand identification in a sort of way, and a solution to this problem may be designing a scheme to remove logo without affecting the styling.

Together with the results in consistency analysis, it can be observed from Figure 11 that brands with high styling consistency usually have more centralized and distinctive styling elements, such as Audi, BMW, and Lexus, whereas Chery possessing a holistic styling with relative low consistency.

Representative brand styling

To help perceiving the brand styling characteristics directly, we present the representative styling of a brand. The representative styling of a brand can be considered as a combination and extraction of several car models with the largest values in the brand.

Here, we take Changan as an analysis example. First, we choose 10 images which are considered as possessing the most recognizable styling for the brand based on equation (7). Table 2 shows top 10 images in the recognition of Changan. With these images, an eigenface technique^37,38 is then applied to generate the representative brand styling. Here the main representative eigenvectors in brand styling space is visualized and then combined together to generate the representative styling.

Table 2.

Top 10 images of most recognizable models from Changan.

Rank	Car images	Rank	Car images
1		6
2		7
3		8
4		9
5		10

Figure 12 shows the generated eigenfaces of Changan, ordered based on eigenvalues. Combining top five eigenfaces, we obtain the computed styling, representing the brand styling. Because the generated styling is blurred, a professional stylist is consulted to render the generated styling, which is named as the representative brand styling. Figure 13 presents the computed stylings and rendered representative brand stylings of Changan, Audi, Nissan, and Volvo. A simple validation experiment is conducted to verify whether these styling can be recognized into brands they represent. The results are supportive to our conclusion.

Figure 12.

Eigenfaces of a model selected from Changan.

Figure 13.

Representative brand stylings: (a) Changan, (b) Audi, (c) Nissan, and (d) Volvo.

Conclusion

This article proposes a machine learning–based method for the car frontal styling analysis, in which the analysis of car styling attributes among various car makers (or brands) is formulated as a brand classification problem. In particular, the brand styling consistency is measured by the leave-one-out classification results. Brand styling patterns are first discovered based on the features for classification and then visualized to reveal salient styling regions. To help designers perceive the branding characteristics intuitively, representative styling of brands are presented as well. To perform the analysis, we build a large-scale database of car frontal styling—CFSDB, which includes 23 popular car brands in the Chinese market. It should be noted that in this article, we only focus on brand attributes in car styling, leaving other semantic attributes such as sporty and luxury unexplored. This can be studied by providing proper styling data and semantic labels. Second, we adopt the PCANet/SVM method as our tool for brand styling classification, which is proved effective in the discovery of styling patterns.

Different from the experience dependent styling analysis methods, which rely on intensive labor of human for the perception and extraction of feature lines or shapes, the proposed machine learning–based car styling analysis scheme can save a great work of designers in analyses and provide more objective results. Moreover, the proposed analytic techniques can effectively help people to reason and perceive branding features in car styling. It needs to be made clear that human knowledge and experience are still very important and effective in the styling analysis. Instead of replacing human, we conclude that our method serves as a complementary assessment tool for designers in the styling analysis and design.

A limitation of our work is that despite the objectiveness of our method, there is a lack of validation for our results, especially for the visualization of brand styling pattern. As stated in Pan et al.,³⁰ there are two difficulties in validation of visualization: (1) defining an error metric for validity and (2) obtaining ground-truth values for validity. To solve this problem, human experience can be utilized and user studies can be conducted, such as eye-tracking. In a more general perspective, car frontal styling design is a complex process requiring both esthetic and functional requirements in a collaboration manner. This article focuses on the esthetic perception of car frontal styling, and functional requirements will be discussed in the future study of holistic frontal design.

Footnotes

Handling Editor: Chenguang Yang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant nos 11472073,11732004,11701357,and 61772104) and the Fundamental Research Funds for the Central Universities (grant no. DUT17JC12).

ORCID iDs

Ying Dong

Mingliang Song

References

McCormack

Cagan

Vogel

. Speaking the Buick language: capturing, understanding, and exploring brand identity with shape grammars. Des Stud 2004; 25: 1–29.

Wang

Chen

et al . Analyzing and predicting heterogeneous customer preferences in China’s auto market using choice modeling and network analysis. SAE Int J Mater Manuf. 2015; 8: 1–10.

Ranscombe

Hicks

Mullineux

et al . Visually decomposing vehicle images: exploring the influence of different aesthetic features on consumer perception of brand. Des Stud 2012; 33: 319–341.

Hyun

Lee

Kim

et al . Style synthesis and analysis of car designs for style quantification based on product appearance similarities. Adv Eng Inform 2015; 29: 483–494.

Chan

Jia

Gao

et al . PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process 2015; 24: 5017–5032.

Fan

Chang

K W

Hsieh

C J

et al . LIBLINEAR: a library for large linear classification. J Mach Learn 2008; 9: 1871–1874.

Hyun

Lee

Kim

et al . Style analysis methodology identifying the car brand design trends through hierarchical clustering. In: Proceedings of the 19th international conference on computer-aided architectural design research in Asia CAADRIA rethinking comprehensive design: speculative counterculture, Kyoto, Japan, 14–16 May 2014, pp.327–336. Hong Kong: CAADRIA.

Hsiao

. Fuzzy set-theory applied to car style design. Int J Veh Des 1994; 15: 255–278.

Tian

Zhang

Jia

et al . Automotive style design assessment and sensitivity analysis using integrated analytic hierarchy process and technique for order preference by similarity to ideal solution. Adv Mech Eng 2016; 8: 1–10.

10.

Person

Snelders

. Brand style in commercial design. Des Issues 2010; 26: 82–94.

11.

Person

Schoormans

Snelders

et al . Should new products look similar or different? The influence of the market environment on strategic product styling. Des Stud 2008; 29: 30–48.

12.

Karjalainen

. It looks like a Toyota: educational approaches to designing for visual brand recognition. Int J Des 2007; 1: 67–81.

13.

Burnap

Hartley

Pan

et al . Balancing design freedom and brand recognition in the evolution of automotive brand styling. Des Sci 2016; 2: e9.

14.

Hyun

Lee

Kim

. The gap between design intent and user response: identifying typical and novel car design elements among car brands for evaluating visual significance. J Intell Manuf 2015; 28: 1729–1741.

15.

Abidin

Othman

Shamsuddin

et al . The challenges of developing styling DNA design methodologies for car design. In: Proceedings of the international conference on engineering and product design education, Enschede, The Netherlands, 4–5 September 2014, pp.738–743. Scotland: DESIGN SOC.

16.

Ranscombe

Hicks

Mullineux

. A method for exploring similarities and visual references to brand in the appearance of mature mass-market products. Des Stud 2012; 33: 496–520.

17.

Jafarpour

Polatkan

Brevdo

et al . Stylistic analysis of paintings using wavelets and machine learning. In: Proceedings of the European signal processing conference, Glasgow, 24–29 August 2009, pp.1220–1224. New Jersey: IEEE.

18.

Blessing

Wen

. Using machine learning for identification of art paintings, 2010, http://cs229.stanford.edu/proj2010/BlessingWen-UsingMachineLearningForIdentificationOfArtPaintings.pdf

19.

Bar

Levy

Wolf

. Classification of artistic styles using binarized features derived from a deep neural network. In: Proceedings of the European conference on computer vision (ECCV) workshops, Amsterdam, The Netherlands, 8–14 September 2018, pp.71–84. New York: Springer.

20.

Zhang

Wang

et al . Recognition of facial sketch styles. Neurocomputing 2015; 149: 1188–1197.

21.

Doersch

Singh

Gupta

et al . What makes Paris look like Paris? Commun ACM 2015; 58: 103–110.

22.

Lee

Efros

Hebert

. Style-aware mid-level representation for discovering visual connections in space and time. In: Proceedings of the IEEE international conference on computer vision, Sydney, NSW, Australia, 1–8 December 2013, pp.1857–1864. New York: IEEE.

23.

Chang

Chen

. A neural network-based computer aided design tool for automotive form design. Int J Veh Des 2007; 43: 136–150.

24.

Yumer

Chaudhuri

Hodgins

et al . Semantic shape editing using deformation handles. ACM Trans Graph 2015; 34: 861–8612.

25.

Pearce

Pears

. Automatic make and model recognition from frontal images of cars. In: Proceedings of the 2011 8th IEEE international conference on advanced video and signal based surveillance, AVSS, Klagenfurt, 30 August–2 September 2011, pp.373–378. New York: IEEE.

26.

Duan

Marchesotti

Crandall

. Attribute-based vehicle recognition using viewpoint-aware multiple instance SVMs. In: Proceedings of the winter conference on applications of computer vision, Steamboat Springs, CO, 24–26 March 2014, pp.333–338. New York: IEEE.

27.

Liu

et al . A vehicle classification system based on hierarchical multi-SVMs in crowded traffic scenes. Neurocomputing 2016; 211: 182–190.

28.

Lai

Guo

. Location-aware fine-grained vehicle type recognition using multi-task deep networks. Neurocomputing 2017; 243: 60–68.

29.

Chen

Hsieh

Yan

et al . Vehicle make and model recognition using sparse representation and symmetrical SURFs. Pattern Recognit 2015; 48: 1979–1998.

30.

Pan

Burnap

Liu

et al . A quantitative model for identifying regions of design visual attraction and application to automobile styling. In: Proceedings of international design conference, Dubrovnik, Croatia, 16–19 May 2016, pp.2157–2166. Scotland: DESIGN SOC.

31.

Cortes

Vapnik

. Support-vector networks. Mach Learn 1995; 20: 273–297.

32.

Chang

Lin

. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2011; 2: 1–39.

33.

Yangl

. A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, 7–12 June 2015, pp.3973–3981. New York: IEEE.

34.

Krause

Deng

Stark

et al . Collecting a large-scale dataset of fine-grained cars. In: Proceedings of the second workshop on fine-grained visual categorization, Edinburg, TX, 28 June 2013. New York: IEEE.

35.

Hsieh

C-J

Chang

K-W

Lin

C-J

et al . A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th international conference on machine learning, Helsinki, 5–9 July 2008, pp.408–415. New York: ACM.

36.

Maaten

Van Der Hinton

. Visualizing data using t-SNE. J Mach Learn Res 2008; 9: 2579–2605.

37.

Turk

Pentland

. Face recognition using eigenfaces. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Maui, HI, 3–6 June 1991, pp.586–591. New York: IEEE.

38.

Wang

Shang

Guo

et al . Real-time vehicle classification based on eigenface. In: Proceedings of the 2011 international conference on consumer electronics, communications and networks, CECNet 2011, Xianning, China, 16–18 April 2011, pp.4292–4295. New York: IEEE.