Abstract
Introduction
Studies on car styling have received much research attention recently due to its significance in presenting the brand identity 1 and its influence to customers’ decisions. 2 Most of previous works focus on analysis of the stylistic features such as lines and shapes3,4 extracted by experts. By calculating pre-defined geometric quantities (e.g. length and curvature) of the stylistic features, they can be related to certain semantics. While these methods are effectively applied to certain tasks within a small dataset they suffer from the following two limitations. On one hand, while the adopted pre-defined features are specifically defined for certain analysis tasks, they may fall short in others, making these methods not readily applicable to various analysis tasks. On the other hand, definition of these pre-defined features would require much expert knowledge and identification of them much human labor, not generalizable to large-scale data.
Due to the two limitations stated above, most previous works aim to analyze styling design of individual cars or of a small group of examples. However, we argue that there are some tasks that shall also be investigated within a much larger dataset, for instance, the brand consistency of car styling. This can be justified by the fact that the brand consistency is a collective response of cars among various brands. It would be prohibitive to apply the conventional methodology to this inquiry.
To address this difficulty and to discuss the brand styling consistency as a group behavior of cars, we propose a machine learning–learning based method which requires no expert-engineered features. We collect a large set of car frontal view images and aim to measure the brand consistency and discover the shared brand patterns via the machine learning method. In particular, we cast the brand styling consistency problem as a brand classification problem which is well suited to be solved by the machine learning methodology. By evaluating the classification accuracy in the prediction stage we measure the degree of the styling consistency of a car brand. Meanwhile, the brand styling patterns can be revealed through the classification task with the proposed decoding technique.
The proposed machine learning method is composed of three stages: feature encoding, brand classification, and an additional decoding technique for visualizing the brand patterns. First, we adopt the PCANet 5 as an encoding mechanism for automatically extracting styling features. While deep learning neural networks are widely applied to diverse tasks, the PCANet, shallow network architecture, is chosen partly because it does not require large-scale dataset compared to deep learning architectures and partly due to its simplicity. Second, the brand classification is performed with the feature encoded by the PCANet and a linear multi-class support vector machine (SVM). 6 The trained classifier for each brand turns out to be a vector in the feature space; therefore, we treat the classifier for each brand as the brand pattern. Third, we further provide a decoding technique to map the high-dimensional feature back to the image domain, revealing the salient stylistic features of the brands.
In the result and discussion, we first present the collected car frontal view image dataset on which the classification task is performed. The styling consistency across the collected brands is investigated and several brands are discussed to support our methodology. Then, the decoding technique is applied to reveal and visualize the shared brand patterns. From the results, we found that these so-called styling patterns of several brands discovered by our machine learning method share great similarity with the public impression on these brands. Finally, to provide useful references with designers, frontal styling representatives of several brands extracted with our method are present as well. To support machine learning methods, one needs to provide sufficient amount of data for training the model. In this article, a large-scale car frontal styling dataset is built based on 23 popular brands in China.
The rest of the article is structured as follows. The research background is reviewed in section “Related works,” followed by the introduction of proposed machine learning method. Section “Experiments and discussion” presents experimental results with discussion in detail, and conclusions are made in section “Conclusion.”
Related works
Experience dependent car styling analysis
Most works in car styling analysis rely on preprocessing of human defining styling features. McCormack et al. 1 analyzed Buick’s brand identity through a shape grammar scheme based on feature lines extracted from Buick’s styling. Based on the extracted car-side silhouette, Hyun et al.4,7 analyzed the similarity between car styles among car brands. Hsiao 8 built a consultative program for the design process. It employed adjectival image words to represent feelings of a customer toward a car, thereby building a connection between the car styles and image words. Tian et al. 9 proposed an integrated analytic hierarchy process–technique for order preference by similarity to ideal solution (AHP-TOPSIS) method for the automotive style design assessment, and they interviewed 20 experts for the performance factors corresponding to styling elements.
Despite the remarkable successes of pre-defined features have achieved in car styling analysis, these methods depend heavily on human abilities in perception, abstraction, and extraction of styling features. Therefore, these studies have a high requirement for expert knowledge and intensive labor, which is not generalizable to large-scale data.
Car brand styling
Styles could be classified not only by groups, schools, regions, and periods in the art 10 but also by brands as a vehicle representing brand culture and characteristics. Person et al. 11 discussed influence factors in decision-making of product styling in detail, and similar styling is more acceptable from branding prospective. Karjalainen 12 suggested analytical methods for the grasp of visual brand recognition based on several car brands. Ranscombe et al. 3 proposed a decomposition method to investigate the influence of esthetic features on brand recognition of vehicles, experimental results validate that esthetic features in the front view have the greatest influence on consumers’ perception of brand, which is also demonstrated in Burnap et al. 13 and Hyun et al. 14 Abidin et al. 15 introduced styling DNA in the perception of the brand image by identifying character traits. With extracted feature shapes from the appearance of products, Ranscombe et al. 16 proposed calculations of geometry properties to conduct similarity analysis.
The researches above illustrated the importance of styling in terms of a brand. However, most studies analyze brand styling through feature lines based methods, which still require much prior knowledge. To make the car styling analysis more objective and intelligent, we propose a complementary data-driven styling analysis approach, which analyzes car brand styling by applying machine learning methodology.
Style recognition through machine learning
With the surge of machine learning methods, researchers have applied these methods to the stylistic analysis or recognition of paintings, sketches, and geography.17–21 Lee et al. 22 proposed a visual data mining approach that discovers connections between recurring midlevel visual elements in historical and geographic image collections. Chang and Chen 23 established a relationship between car profile characteristics and consumers’ image perception using a back-propagation neural network. Similar work can be found in Yumer et al. 24 as well, these studies demonstrate a remarkable ability of machine learning in pattern discovery. In our work, we take the advantage of these methods in car recognition25–29 to classify styling in terms of brands, and further measure the visual consistency and interpret discovered patterns.
Recently, Pan et al. 30 proposed a deep learning method for identifying salient regions of design attributes in automobile styling. In particular, a deconvolution mechanism is applied to the visualization of regions for predicted attributes, which is similar to our pattern visualization. Although they studied the car styling in a collective manner, patterns behind the group behavior and the measurement of intensity in styling identity are not fully discussed, in part, because of the convolutional neural network (CNN) methods they employed. In our work, we adopt a simple and effective method to explicitly present and interpret discovered patterns in car brand styling.
A machine learning–based car brand styling analysis
In this section, a machine learning–based method is proposed to analyze the brand styling in car frontal face design and to discover any holistic feature patterns characterizing the brands. The brand styling analysis problem is formulated as a classification task which is suitable for the application of the machine learning method. This formulation is justified by the fact that the brand styling can be seen as a collective behavior of a cluster of cars from a certain brand. In this way, the degree of the styling consistency of a brand can be defined as the classification accuracy among the dataset.
Three components are mainly contributes to the brand styling analysis framework: feature encoding, brand classification with the extracted features, and the decoding technique for holistic feature visualization. In what follows, the PCANet adopted to extract the holistic features by encoding the images into high-dimensional feature space is introduced in the first place. It is followed by the description of the SVM classifier and the training process for recognizing the brand of each input image. The measure of the brand consistency is then present as the classification accuracy based on the given dataset. Finally, the decoding technique devised for visualization of the holistic features patterns trained through the classification task is demonstrated.
Feature encoding with the PCANet
The PCANet is a network architecture which consists of multiple image processing techniques to encode each given image in the dataset into a high-dimensional vectorized representation. The space spanned by these vectorized codes is termed feature space, and each of the vectorized codes is called a feature vector. Thus, one can see the PCANet as a non-linear mapping from the image space to the feature space, both encoding the necessary information for recognizing the attributes, such as the brand in our task, associated with a specific data sample.
The PCANet contains three stages: two consecutive stages of cascaded principal component analysis (PCA), and the last output stage of a combination of binary hashing and block-wise histogram. This hierarchical architecture and the patch process ensure a rich and detailed feature representation. Figure 1 shows how the PCANet encodes an image of car front face sample to a feature vector. The details of the PCANet are introduced below.

Pipeline of the PCANet encoding process.
Given
The covariance matrix
Each eigenvector is a column vector in
Applying
where * means image convolution operation.
This leads to
The second stage of the PCANet is to apply the PCA filtering to each filtered image output by the first stage, thus the output of second stage is obtained as follows
where
The third stage accounts for the binarization and merging of outputs by the two consecutive the PCA filtering stages. The binarization step
In this way, each primary filtered image corresponds to a weighted sum. Again, this process aims to emphasize the important regions founded by eigenvectors according to their rank. To perform classification, the
The block size is closely related to the “resolution” of the final feature vector; the larger the block size, the lower resolution of the feature vector. This is relevant to the decoding process where the final output feature vector is mapped back to the blocks of an image, demonstrating the saliency of each block with respect to the brand classification and thus revealing the important pattern of the related brand. This will be discussed later in section “Styling pattern visualization via feature decoding.”
Then, a histogram (with
These high-dimensional feature vectors extracted by the PCANet from the input dataset spans the so-called feature space in which the classification is then performed.
Brand classification with SVM
The classifier employed here is the SVM which is initially proposed by Cortes and Vapnik
31
to solve two-group classification tasks and later generalized to multi-classification problems. Many efficient libraries have been proposed, such as LIBSVM
32
and LIBLINEAR.
6
The main idea of the SVM is to find the optimal hyperplane with the normal vector
where
For multi-class classification tasks where a training dataset with
Since the entries in
According to equation (7), the prediction value
Measuring styling consistency of a brand
An intuitive idea of the degree of brand styling consistency is that a brand has higher styling consistency if more car models of the brand can be correctly identified based on their styling and discovered brand styling pattern. To mathematically define a metric based on this primitive idea, we use the brand recognition rate as the metric for evaluating the brand styling consistency. A widely used implementation in classification task is to split the dataset into two subsets, one for training and the other for classification testing. However, our goal is not to provide a generic classifier for future classification tasks; instead, we would like to use the classifier as a tool for measuring the degree of styling consistency of a brand within the given data collection. In this way, we adopt the leave-one-out scheme.
Given a car image
where
While the styling consistency of a brand is evaluated based on the correct recognitions, we can investigate similarity among brands by evaluating the false recognitions. A false recognized car model of a brand indicates that its styling is similar to the styling pattern of the recognized brand according to our framework and thus the amount of false recognized models from one brand to another reflects the similarity of these two brands in styling. Accordingly, a visualization scheme is employed to present the styling similarity among brands. Figure 2 shows a diagram of brand styling similarity. Region A refers to a collection of samples recognized correctly as brand A. Region BA are samples belonging to brand B but falsely recognized as brand A. The other regions in Figure 2 are interpreted likewise.

Diagram for visualization of brands’ styling similarity.
Styling pattern visualization via feature decoding
By leveraging the fact that a pattern vector is also in the feature space, decoding can be achieved by inversely processing of feature encoding. Recalling the histogram computation of blocks in output stage of the PCANet, any vector in the feature space can be restored to blocks in the image domain. Equation (7) implies that the positive entries in the pattern vector are of significance in classification and thus can lead to essential information about brands. The combination of positive entries and their corresponding blocks therefore indicates salient locations in an image that define the category of this image during the classification task. Following this rationale, we thus visualize the salient area of a brand in an image and further consider the salient region as styling pattern of the brand.
As mentioned above, the block size is relevant to the visualization of brand styling pattern. Suppose the size of a block is set to be
where
The proposed scheme transforming a pattern vector into an image is shown in Figure 3. Specifically, based on the process in the output stage of the PCANet, a pattern vector is first divided into

Process of decoding a portion of pattern vector into the salient image. The pattern vector is first divided into
Experiments and discussion
In this section, implementation details of the proposed method is described in the first place, including the introduction of a car frontal styling database (CFSDB) and parameters chosen for training the PCANet/SVM framework. Then, experimental results are presented and discussed in three aspects: brand styling consistency, visualization of brand styling pattern, and representative brand styling.
CFSDB
To support the style-based car brand classification, a car styling database is required. In fact, there are some benchmark car datasets33,34 in computer vision community, in which cars are usually captured through traffic cameras or taken under various viewpoints. However, styling features may not be holistically perceived and precisely extracted due to the lighting, occlusions, and various viewing angles in these mega scale databases. Recalling our task of car styling studies within and across brands, a specific car styling database is needed. As stated in related works, frontal view of a car mostly reveals the brand’s style and has more impact on the vision perception of branding characteristics. Thus, we build the CFSDB to facilitate the car styling analysis research.
Since styling elements in car frontal view are mainly focused on headlights, grille, and bumper, 3 we select the union of these details as the analyzed styling region, that is, styling region(s) of interest (ROI). Because of different aspect ratios of styling ROI in sedan and SUV, we build two datasets accordingly. Figure 4 shows part of car styling samples of Audi and Volkswagen in two datasets. Both datasets are well aligned in a front view. The sedan dataset contains 4726 images corresponding to 22 brands, and the SUV dataset contains 2441 images corresponding to 21 brands. These car brands are selected considering both sales and wide acceptance of high degree of brand styling; the selected 23 brands and their numbers are shown in Figure 5.

(a) Audi samples in the sedan dataset, (b) Volkswagen samples in the sedan dataset, (c) Audi samples in the SUV dataset, and (d) Volkswagen samples in the SUV dataset.

The numbers of car models in brands: (a) sedan dataset and (b) SUV dataset.
Parameter settings of PCANet/SVM
Before exploring styling patterns, we first define a proper size of image to balance the computational performance and efficiency. The cross-validation experiments is conducted based on 2200 samples randomly chosen from 22 sedan brands and 1260 samples from 21 SUV brands. The statistics of experiments are shown in Figure 6, and the proper size of sedan ROI and SUV ROI is 48 × 120 and 52 × 112 pixels, respectively.

ROI size optimization: (a) sedan and (b) SUV.
Then, an experiment is conducted to find proper parameters of the PCANet, in which four parameters are considered: patch size, number of filters, block size, and block overlap ratio. The optimal results are shown in Table 1, which are also decided based on accuracy and computation cost. For the implementation of SVM, we adopted LIBLINEAR for its multi-classification strategy of one-versus-rest, in which the solver
Parameters in the PCANet.
With proper parameters, a leave-one-out cross-validation is first conducted to get the distribution of accuracies of all brands. Then, a classification of complete dataset is conducted to obtain brand styling patterns, in which two parameters of block size and block overlap ratio are adjusted as illustrated in the pattern decoding. It should be noted that this minor revision barely changes the classification performance but causes a simpler manner in visualization.
Styling consistency analysis
After the completion of leave-one-out classification experiment, we first visualize the distribution of models in sedan dataset based on their PCANet features and classification results. Figure 7 shows the whole distribution of classification results generated by t-distributed stochastic neighbor embedding (t-SNE), 36 with perplexity of 30, learning rate of 500, and the number of iterations as 1000. We can see that the models belonging to Audi, Benz, and Volkswagen are clustered and distant from others, reflecting their high degree of brand consistency and distinctive brand styling, while the models mixed together illustrate similarity among these corresponding brands.

Visualization of classification results.
The detailed accuracies of sedan brands are shown in Figure 8. We can see that European brands like Audi, BMW, and Volkswagen achieve high recognition rates, which comes to the conclusion that these brands possess higher degree of styling consistency. For the Asian brands, such as Toyota, Hyundai, and Geely, the relative low recognition rates illustrate more car models within these brands deviate from the discovered identities and reflect a higher degree of styling diversity. More specifically, BMW and Audi achieve the highest accuracies with 100.00% and 99.73%, followed by Volvo, Skoda, and Volkswagen with 99.56%, 99.43%, and 99.41%, respectively, illustrating their high brand styling consistency. This accords with the cognition in the market and other researches.7,15 While brands like Toyota, Hyundai, Nissan, Geely, and Chery, whose brand accuracies are relatively lower than others, are considered as lacking of the styling consistency in their models, at least from the frontal view. This may reflect the diversity of car styling within these brands.

Accuracies of sedan brands.
To further verify the styling consistency or diversity of brands based on the classification results, we present a styling similarity visualization method to show the false recognized car models in a regional distribution manner.
Figure 9 shows the false recognized samples related to Audi, Volkswagen, and Chery, respectively. As shown in Figure 9, few models in Audi and Volkswagen are recognized into other brands, whereas much models in Chery are falsely recognized other brands such as Geely, Volkswagen, and Hyundai. Apart from the same conclusions drew above that Audi and Volkswagen possess higher degree of styling consistency whereas Chery with a styling diversity, the styling similarity can be preliminary judged through the number of cross samples false recognized. Briefly, more samples in the boundary region between two brands indicate a higher degree of styling similarity in the corresponding brands. For example, a higher similarity is observed between Geely and Volkswagen for more car models deviating from the patterns of their brands and falling into the other side. It should be noted that this visualization can only provide a preliminary conclusion in similarity, and a validation is needed for the further discussion, such as the validation through the user study.

Styling similarity analysis: (a) Audi, (b) Volkswagen, and (c) Chery.
With the observations above, we can further check the incorrectly recognized car models to analyze the similar styling models among brands as shown in Figure 9(a). From Figure 9(a), we can see that only one model in Audi are recognized as Kia and that the false recognized model is produced approximately in 2004, which is not designed with the “wide-mouth” grille at that time. Apparently, the styling of this model falls away from the styling pattern of Audi and is falsely recognized into Kia, to which it is more likely to belong. While for the car models recognized as Audi from other brands, two of them possess the “wide-mouth” grille similar to Audi’s styling. Thus far, we can only guess that the styling pattern discovered in Audi may be the “wide-mouth” grille, because there are two models recognized as Audi, yet with no apparent styling similar to Audi’s pattern from the perception of human. Therefore, there is a need to present what the pattern of every brand is to make further explanation, which is discussed in the following section.
Visualization of brand styling pattern
According to the decoding process designed for brand gene’s visualization, we now visualize brand patterns extracted in the classification task. The brand genes are transformed to saliency maps with warmer/colder colors for higher/lower saliency. Each saliency map is overlaid on the representative car image of the corresponding brand. Figure 10 shows the detailed visualization process of Audi’s styling genes, in which the highlight area in the recovered images represents the key styling region for the brand. Figure 11 shows the visualization results of Audi, BMW, Lexus, and Chery.

Salient images from the styling pattern of Audi: (1) the dividing process of pattern vector, (2) the summarization, and (3) the rearrangement from a feature vector to a salient image in which salient regions are in red.

Salient images from brand styling patterns: (a) Audi, (b) BMW, (c) Lexus, and (d) Chery. Warmer color indicates a high degree of saliency.
From Figure 11(a), we can see that the salient styling region for Audi is mainly distributed around the grille and logo area. This indicates that the grille is the most distinctive styling features for Audi besides the logo. However, to be more specific, the brighter areas are located in the upper and side regions of grille. This is a cognition deviation from the complete shape of “wide-mouth” grille of Audi in the market. Nevertheless, we consider this result valuable for the interpretation ability in locating more details instead of relying on the subjective perception of human. It should also be declared that the perception of human plays a significant role in the styling assessment, such as the abstraction ability.
In Figure 11(b), it obviously states that the center region of the grille plays the most important role in the recognition of BMW from other brands. Although there is a cognitive deviation from the “kidney grille,” we can intuitively observe that it is the separation in center that forms BMW’s double “kidney grille.” Therefore, the discovered pattern in BMW can be regarded as learning more detailed features in styling that can be used for the interpretation of the semantic shapes perceived by human. Similarly, Lexus’ pattern shown in Figure 11(c) can explain the distinctive spindle grille except the highlighted logo.
However, as shown in Figure 11(d), we notice that nearly the whole region in Chery’s frontal styling is highlighted with several slightly darker regions scattering around the logo. This may state that Chery has a variety of styling types in its products and lacks a unique and unified brand styling; therefore, the discovered pattern can only depend on the holistic styling to achieve the brand classification.
From the above results, we can see that logos in some brands affect the brand identification in a sort of way, and a solution to this problem may be designing a scheme to remove logo without affecting the styling.
Together with the results in consistency analysis, it can be observed from Figure 11 that brands with high styling consistency usually have more centralized and distinctive styling elements, such as Audi, BMW, and Lexus, whereas Chery possessing a holistic styling with relative low consistency.
Representative brand styling
To help perceiving the brand styling characteristics directly, we present the representative styling of a brand. The representative styling of a brand can be considered as a combination and extraction of several car models with the largest values in the brand.
Here, we take Changan as an analysis example. First, we choose 10 images which are considered as possessing the most recognizable styling for the brand based on equation (7). Table 2 shows top 10 images in the recognition of Changan. With these images, an eigenface technique37,38 is then applied to generate the representative brand styling. Here the main representative eigenvectors in brand styling space is visualized and then combined together to generate the representative styling.
Top 10 images of most recognizable models from Changan.
Figure 12 shows the generated eigenfaces of Changan, ordered based on eigenvalues. Combining top five eigenfaces, we obtain the computed styling, representing the brand styling. Because the generated styling is blurred, a professional stylist is consulted to render the generated styling, which is named as the representative brand styling. Figure 13 presents the computed stylings and rendered representative brand stylings of Changan, Audi, Nissan, and Volvo. A simple validation experiment is conducted to verify whether these styling can be recognized into brands they represent. The results are supportive to our conclusion.

Eigenfaces of a model selected from Changan.

Representative brand stylings: (a) Changan, (b) Audi, (c) Nissan, and (d) Volvo.
Conclusion
This article proposes a machine learning–based method for the car frontal styling analysis, in which the analysis of car styling attributes among various car makers (or brands) is formulated as a brand classification problem. In particular, the brand styling consistency is measured by the leave-one-out classification results. Brand styling patterns are first discovered based on the features for classification and then visualized to reveal salient styling regions. To help designers perceive the branding characteristics intuitively, representative styling of brands are presented as well. To perform the analysis, we build a large-scale database of car frontal styling—CFSDB, which includes 23 popular car brands in the Chinese market. It should be noted that in this article, we only focus on brand attributes in car styling, leaving other semantic attributes such as sporty and luxury unexplored. This can be studied by providing proper styling data and semantic labels. Second, we adopt the PCANet/SVM method as our tool for brand styling classification, which is proved effective in the discovery of styling patterns.
Different from the experience dependent styling analysis methods, which rely on intensive labor of human for the perception and extraction of feature lines or shapes, the proposed machine learning–based car styling analysis scheme can save a great work of designers in analyses and provide more objective results. Moreover, the proposed analytic techniques can effectively help people to reason and perceive branding features in car styling. It needs to be made clear that human knowledge and experience are still very important and effective in the styling analysis. Instead of replacing human, we conclude that our method serves as a complementary assessment tool for designers in the styling analysis and design.
A limitation of our work is that despite the objectiveness of our method, there is a lack of validation for our results, especially for the visualization of brand styling pattern. As stated in Pan et al., 30 there are two difficulties in validation of visualization: (1) defining an error metric for validity and (2) obtaining ground-truth values for validity. To solve this problem, human experience can be utilized and user studies can be conducted, such as eye-tracking. In a more general perspective, car frontal styling design is a complex process requiring both esthetic and functional requirements in a collaboration manner. This article focuses on the esthetic perception of car frontal styling, and functional requirements will be discussed in the future study of holistic frontal design.
