Abstract
Processing faces and facial expressions are crucial for all forms of social communication, and the interpretation of emotion is culturally dependent. Most existing databases are based on Caucasian, Mongoloid (Chinese, Japanese, Koreans), or African-American faces. Limited databases contain Indian faces. AIIMS Facial Toolbox for Emotion Recognition (AFTER) database would be useful in the Indian context for researching facial emotion recognition.Key Message:
A wealth of interpersonally relevant information is gathered by observing faces and their expressions. Recognition of affective states, particularly the basic emotions, is a prerequisite of intact social behaviour. 1 Deficits in processing these perceptions could make appropriate reactions impossible.2,3 Images of faces and facial expressions are commonly used as stimulus materials in diverse research fields.
Facial expressions have been called the universal language of emotion. 4 The concept claims that all humans communicate six basic internal emotional states (happiness, anger, sadness, surprise, fear, and disgust) using similar facial movements, by virtue of their biological and evolutionary origins. On the contrary, many recent researchers have opposed the notion and suggested that perception and interpretation of emotion are culture-dependent.5–8 While classic studies demonstrated that emotion recognition was above-chance even for individuals from disparate cultures,9,10 they also mentioned that the recognition was more accurate when the emotions were both expressed and perceived by the members of similar culture. 11 The facial stimuli in existing databases tend to vary substantially in terms of facial feature characteristics and expression of emotions, depending on the representative culture from which the database was built. For example, Radboud Faces Database (RaFD), FACES database, and the Karolinska Directed Emotional Faces (KDEF) database contain only Caucasian models,12–14 while the Racially Diverse Affective Expression (RADIATE) face stimulus and Tsinghua Facial Expression Database contain only Chinese faces.15,16
A few databases containing Indian faces have been developed, making an invaluable contribution to facilitating research on emotion processing.17–20 However, the parameters of existing facial picture sets may not always satisfy the objectives of the experiment. For example, databases may have images of only a few actors 17 or less intense emotions as it was developed for computer-generated algorithms 18 or were developed long ago and the pictures are available in black and white only. 19 These inadequacies present a gap in the existing databases, generating a need for a more standardized toolbox that can be used by the brain research community in the Indian subcontinent. In this report, we present the development of a validated database for the recognition of emotions, containing static images of Indian faces.
Materials and Methods
Development of the Database
The data presented in the current paper on the development of static facial images are part of a larger study comprising the development of an entire toolbox for varying facets of emotion recognition. This cross-sectional study was conducted from March 2019 to August 2021 at the Department of Psychiatry, All India Institute of Medical Sciences, New Delhi, India. The Institute Ethics Committee had approved the study. The current database contains front-gazing portrait images of 15 participants. The recruited participants were models undergoing their final year of graduation from a professional drama college of India—National School of Drama (NSD) (six males and nine females; mean ± SD age = 26.2 ± 1.93 years).
Based on the previous literature, eight facial expressions were selected—neutral, happiness, anger, sadness, disgust, contempt, fear, and surprise. 21 The models expressed these emotions in high and low intensity. Each expression was shown with the eyes directed straight ahead. This accounted for 120 low-intensity and 120 high-intensity raw pictures (15 actors × 8 emotions × 2 intensities).
Before the photo shoot, the models were given detailed instructions about the targeted emotions. Beforehand, the models practiced all emotional expressions. During the photo shoot, each model took approximately 45 minutes to pose for all the emotions, during which they intermittently took breaks to transition from one emotion to another. Each model posed for the eight different facial expressions in high and low intensity as per the consensus amongst the expert participants. All individuals portrayed eight facial expressions along with a neutral expression. The photoshoot took place at NSD. The photos were taken against a light background in a brightly lit room. Throughout the photo shoot, a psychiatrist and a psychologist discussed each expression after clicking the photograph of an individual model. They proceeded further only after a consensus on the valid display of the concerned emotion was reached.
Validation of the Database
Participants
The database was validated by two faculty members of NSD and 19 qualified mental health professionals from the Department of Psychiatry of a tertiary care hospital (11 males and 10 females; mean age =28.6 ± 12.1 years). All raters had a normal or corrected-to-normal vision and volunteered for no-cost participation.
Procedure
Raters were presented with randomized images of different facial expressions on a 14-inch computer screen. They were seated 50–70 cm away from the screen and proceeded at their own pace in the presence of a researcher (NPS or NK). Raters were asked to recognize the expression that was the best fit for the emotion depicted in the image, choosing one of the nine response categories: happiness, anger, sadness, neutral, surprise, fear, contempt, disgust, and other.
Subsequently, the raters judged the dimensional aspect of the image on a 5-point Likert rating for the: (a) intensity of the expression (weak to strong); (b) clarity of the expression (unclear to clear); and (c) the genuineness of the expression (fake to genuine).
Analysis
Facial expression recognition was evaluated by hit rate percentage with standard deviation (SD), where the proportion of raters who agree with the intended expression was calculated for each emotion.19,21,22,23 Initially, for each image, hit rates were calculated for emotion recognition. Then, 11 images of each emotion with the highest hit rates were selected, and mean hit rate for the emotion was calculated from these image sets.
In addition, we also calculated Fleiss’ kappa, which is a chance-corrected measure of agreement between the intended expression and the raters’ labels. For each image, we also calculated the mean judgments on dimensions of clarity, intensity, and genuineness.
Results
All individuals in the data set portrayed eight facial expressions along with a neutral expression. The validation started with two experts (faculty from NSD) who assessed the high- and low-intensity images individually. The expert consensus for recognition of low-intensity emotion was very low. Hence, the low-intensity emotional expression pictures were excluded and were not assessed further. Finally, a total of 120 facial images were assessed by 21 raters on four ratings for the measures of expression, intensity, clarity, and genuineness.
Emotion Recognition
For each image, we calculated how many participants chose the correct emotion label. There was discrepancy in accuracy rates amongst models in portraying a given emotion. We calculated the rates for each image and finally selected 11 images with the highest hit rates for each emotion to be included in the database—AIIMS Facial Toolbox for Emotion Recognition (AFTER). The hit rate of each selected image for individual emotion is provided in Table S1. The overall mean hit rate percentage of all emotions across the database was 75.5% (SD = 24.00). The percentage hit rates for each emotion are depicted in Table 1. The hit rate for contempt was very low—20% (SD = 13.94); hence, the contempt emotion was removed from further analysis. The new mean hit rate percentage of all the emotions after the exclusion of contempt was 84.3% (SD = 8.67). Furthermore, the mean kappa for emotional expression was 0.68. Happiness, anger, surprise, neutral, and fear expressions had mean proportion correct scores ranging between 0.70 and 0.89, whereas the kappa scores for disgust and sadness were relatively lower (Table 2).
Hit Rate for Emotion Recognition and Dimensional Scores
SD: standard deviation.
Inter-rater Agreement for Individual Emotion Recognition
Discussion
This paper presents a new database of Indian faces with seven facial expressions (happiness, anger, sadness, disgust, surprise, fear, and neutral). The pictures were validated for expression recognition and rated over three dimensions: intensity, clarity, and genuineness. This enables an assessment and standardization of the quality of the data set.
We calculated the inter-rater consensus for absolute emotion recognition as indexed through the hit rate and Fleiss’ kappa coefficient. The mean hit rate of overall emotion recognition was 84.3%, which is comparable to or even higher than other international databases. The scores are comparable to those reported by the databases for the Pictures of Facial Affect with a mean accuracy of 88% 24 ; FACES with a mean hit rate of database ranging from 67% to 96% for different emotions 12 ; RaFD with average percentage correct response being 82% 13 ; and KDEF database with a mean hit rate of 71.87%.18,25
In the context of other Indian emotional faces databases, to the best of our knowledge, only two groups have conducted a validation study. One of the databases, Tool for Recognition of Emotions in Neuropsychiatric Disorders (TRENDS), reported a higher hit rate (80% to 100%) with an inter-rater agreement of 60% and internal consistency of 0.669 using Cronbach’s alpha.17,26 However, the pictures were evaluated on a “forced choice” design where the raters had to select one of the emotions from a predetermined list. The relative merits of such studies have been hotly debated. Researchers have suggested that such studies prime participants to interpret stimuli as expressions of emotion and inflate agreement by constraining choices. 27 It has been observed that forced choice can lead to consensus on clearly incorrect categories when relevant choices are missing from the list.28–30 In our study, we overcame this bias by adding one more category, “other,” where the raters could freely label the expression. The authors of TRENDS have reported the reliability of the database but have not mentioned the inter-rater variability. The models depicting the emotions were only 4 compared to the current database having 15 models. The percentage of correct responses was reported in the current study for each individual image by every rater, reducing the possibility of false high consensus amongst the raters for a given image. It is possible that for a few faces (as mentioned in the supplementary table), the depiction of sad and disgust emotions is difficult to ascertain with preciseness, leading to the somewhat lower values of kappa for these emotions as compared to other mentioned emotions. Choosing images with less percentage hit rates for “other emotion” would be more accurate in depicting the emotion in future research.
Another widely used Indian database is by Mandal 19 has five photographs each for six emotions (anger, disgust, happiness, sadness, surprise, fear), for which 70% of raters (out of 100) were unanimous. However, the pictures were rated on a 7-point scale for evaluating the intensity rather than recognition of emotion, with neutral or no emotion at one end of the scale and only the intended emotion at the other.
One other database is the Indian Spontaneous Expression Database for Emotion Recognition. 18 The expression annotation and intensity of each expression were decided by taking the average ratings of four decoders. The reliability of agreement between the four raters, evaluated using Fleiss’ kappa coefficient, was 0.85. Apart from the models not being experts in expressing emotions, the videos for eliciting emotions were not standardized. The authors classified for recognition of only four emotions, using machine learning algorithms, and that could have led to a higher propensity of type I error.
Notably, for the contempt emotion, the hit rate for expression recognition was substantially low (20.3%) and hence it was excluded from the analysis. This finding is in accordance with previous studies reporting that the recognition of contempt is the lowest among the emotions.13,31 Literature has divergent views on naming contempt as being a universal emotion.32,35 The expression and recognition of contempt are highly dependent on culture and context. 5 Hence, most of the facial emotion databases do not include contempt.
Apart from validating the database, we also assessed the ratings for each emotion on the dimensions of intensity, clarity, and genuineness. All emotions had high mean scores on these dimensional constructs, except for contempt. Individuals with a deficit in emotion recognition, such as those with schizophrenia, have difficulty recognizing less intense expressions. 19 Expressions without clear cues require more attention to decode, while extreme expressions have the advantage of amalgamating several facial cues, which helps structure the visual field despite less attention. Probably, these factors have a direct bearing on performance in recognition of extreme expressions. The current dataset can be assumed to be composed intense, clear, and genuine depictions of emotions, suggesting its possible use in future research on emotion recognition.
Based on the hit rates and the good inter-rater reliability, it might be concluded that the AFTER database offers a valid set of affective stimuli for recognizing emotions. These static pictures can be used freely for emotion research. Researchers can select pictures as a function of parameters like the quality of the emotional expression, hit rate, intensity, clarity, and genuineness.
The current database is limited by not including faces of different age groups. However, most databases worldwide have utilized faces of models of adult age group only. Literature suggests that it is more difficult to identify an emotion expressed by an older than a younger face. 36 We did not analyze the database for gender-based discrimination of images, which may be attempted in a further posthoc analysis of the database. The targeted emotional expressions were based on expert consensus and not on standard prototypes such as defined by the Facial Action Coding System. 37 The low-intensity images were not included in the database as the expert consensus for recognition of low-intensity emotion was very low. Dynamic stimuli can cover the whole range of emotions that more closely mimic real-life situations. Reading and listening are other aspects of emotion recognition that reflect real-life scenarios. The current study reports a static image database; however, the final overall emotional toolbox will comprise these other facets of emotion recognition. Also, the validation has been performed by a small number of qualified mental health professionals. These professionals are trained to perceive emotions better than the general population. Further validation is required in different sub-populations for the database to be extended to the larger general population. Future studies may develop databases with a differential intensity of emotion expressions to understand the impact of varying intensity on emotion recognition.
The clinical utility of the AFTER database could be in ascertaining the emotion recognition capability of individuals with various neuropsychiatric conditions, assessing change in emotion perception when patients move from an acute to a remitted state, or predicting the likelihood of relapse. Being a computerized version, it may find use in developing emotion-recognition-related tasks in brain imaging studies of emotion recognition with techniques such as functional magnetic resonance imaging, functional near infra-red spectroscopy, quantitative electroencephalography, and eye-tracking studies, with a more culturally relevant stance.
Conclusions
The AFTER database would be useful in the Indian context for conducting research in the field of emotion recognition. Such a culturally sensitive database may be useful to capture the perception of emotion from an ethnic perspective. AFTER has been validated in a cohort of experts and is found to have good inter-rater reliability. This database shows promise for use in research settings and needs to be validated in the general population.
Supplemental Material
Supplemental material for this article is available online.
Footnotes
Declaration of Conflicting Interests
Funding
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
