Abstract
1. Theoretical Background
1.1. Robot as the Other: Socio-psychological Background of Human-Robot Interaction
On the basis of the last decade of human-robot interaction research, many transdisciplinary researchers have asserted that it is necessary to explain and design a robot as the “other human.” Such assertions are based on the socio-psychological theories for understanding another person's mind. One can understand others not by analysing their intelligence, but by simply observing and simulating the embodied knowledge they acquire from repeated interactions with their social environments [1–5]. From this perspective, previous studies have emphasized the importance of human behaviours and emotional expressions (embodied knowledge) as more useful communication techniques than human cognitive elements (intelligence); accordingly, the concepts of embodied knowledge have been adopted for designing social robots. Dautenhahn and Werry attempted to explain the sociability of robots from the viewpoint of autism, a socio-psychological disorder [6].
In such contexts, the imitation (or simulation) of others' behaviours and emotions has enabled researchers to gain an insight into the design of social robots for interactions with humans [7]; this is referred to as the “homo-affinity of robots” in the present study.
Active Inter-modal Mapping (AIM), a social imitation model proposed by Meltzoff and his colleagues, has played an important role in studies on social robots [8–10]. According to this model, an infant has inborn social strategies that enable it to communicate with its caregivers by imitating their behaviours and understand their minds [11]. Furthermore, an infant's imitative behaviours are not determined by static innate cognitive elements; instead, they involve a dynamic goal-directed matching process and proprioceptive feedback [12].
Breazeal et al. adopted the AIM theory for designing interactive behaviours of social robots [9]. They tried to develop social learning mechanisms through which robots can provide interactive responses to humans. To implement the learning mechanism, they introduced the concept of “emotional states” of robots and they tried to explain how these emotional states can influence the decision making related to a robot's social behaviours.
1.2. Aesthetic Imitation: Aesthetic Background of Human-Robot Interaction
In social robot research, the concept of imitating others' behaviours or emotional expressions is adopted for explaining how to understand others' minds. Similarly, in the field of aesthetics, the concept of mimesis (aesthetic imitation) is adopted for explaining aesthetic experiences and artistic behaviours. Mimesis can be defined as the mimicking of objects or actions through aesthetic attitudes or artistic methods. It is an abstract or metaphysical concept that governs the creation of works of art, such as paintings, theatrical performances and musical compositions, which facilitate the transmission of culture [14]. Mimesis refers to a type of imitative interaction with objects or others in an aesthetic state of mind.
Thus, aesthetic experiences based on mimesis have a common social substructure, in which mimesis can link subjects (humans) with objects (robots) on the basis of aesthetics. Moreover, according to aesthetics, one can enhance one's social interests by sharing one's mind or emotions with others through aesthetic imitations.
Aesthetic theories state that aesthetic experiences include both cognitive and emotional processes; the experiences consist of aesthetic judgment and emotions [17]. The theory of aesthetic attitude and its variations explain that aesthetic cognition aims at the enjoyment of aesthetic experience, and the cognition is not directly influenced by emotions caused from the object. For example, we are satisfied and impressed with a horror movie because of its aesthetic cognition, not because of its emotion through the fear [15, 16, 24, 25]. Accordingly aesthetics have tried to explain how one can socially transmit one's own individual aesthetic experience to others through emotions and how one can consider him/herself a social member [20–22]. Similar to the theory of aesthetic attitude, Lazarus's emotional model shows that before the emotion occurs, people undergo a cognitive process (i.e., judgment or evaluation) of what happens to them, and the cognitive process is constructed from adaptive relations with the environment [18, 19].
Based on this theoretical background, we assume that an “aesthetic state” of a robot can be constructed from an aesthetic imitative interaction with a human, beyond the emotional state of a robot proposed by Breazeal et al. [9], and it can effectively contribute to the implementation of the homo-affinity of a robot. It is possible to improve the social behaviour of a robot through aesthetic imitative interactions, in which cognitive processes play a greater role than emotions. In this paper, we propose a new model of a social robot that can enhance its sociability through aesthetic imitative interactions such as “playful acts” (“freies spiel” or free play in Kantian aesthetics) with humans [17, 23].
2. Aesthetic Imitative Interaction between Child and Robot
2.1. Structure of Aesthetic Imitative Interaction
Social and cognitive structures for understanding others by imitating their behaviours and emotions are similar to those for enhancing the social interest of an individual through aesthetic imitations.
However, in the former case (socio-psychological), the goal of imitative behaviours is to identify and understand others' minds or to fulfil one's own needs [13]. In contrast, in the latter case (socio-aesthetic), the goal of imitative behaviours is to have fun by playing with others (robots), which is naturally followed by sociability or social interest [26, 27].
Thus, playing for fun itself is an important aspect of imitative interactions between robots and humans. Mime and theatre involve imitative interactions that are aesthetic in a broad sense [26]. For example, infants undergo aesthetic experiences through playful interactions with their caregivers and they express amusement when unexpected stimuli are suddenly presented to them while playing [27]. We applied the concept of aesthetic imitative interactions to child-robot interaction. Children imitate the emotional behaviours of a robot through aesthetic interactions; they play with the robot by mimicking its emotional facial expressions. The aesthetic interactions between the children and the robot would reinforce their positive social relationships and enhance their social adaptability.
In summary, aesthetic imitative interactions accompany positive emotions, and consequently, the positive emotions motivate future interactions, making them more interesting. Such cyclic procedures (Figure 1) are elaborately designed on the basis of the AIM model, which comprises a cyclic cognitive process of emotional imitative behaviours.

Aesthetic experience model of child-robot imitative Interaction based on Meltzoff-Moore's AIM model
By exploiting these procedures, we attempted to obtain a more adequate answer to the question of how a social robot aesthetically interacts with a child, that is, how imitative interactions accompanied by positive emotions provide a cyclic and relatively stable cognitive process while enhancing social attitudes between a child and a robot. To the best of our knowledge, this is the first social robot study to investigate child-robot interaction on the basis of aesthetics.
2.2. Experimental Scenarios and Procedures
In previous studies on emotional imitative interactions between infants and their caregivers, researchers have analysed the imitative behaviours of neonates, who are considered as the target subjects (participants). Although the results were sufficiently promising, the participants were not capable of performing complex imitations such as playing.
Thus, we tested older children (aged 3 to 5; five males and five females), who could understand the facial expressions of a robot and perform relatively complex acts with it. We examined whether the children could construct aesthetic experiences through imitative interactions based on the robot's facial expressions; such aesthetic imitative interactions provide positive emotions, which motivate affinitive interactions. For this purpose, it is necessary to design playful imitative activity procedures that induce our participants to undergo aesthetic experiences.

Three types of facial expressions of a robot on the aesthetic level: positive (type 1), negative (type 2) and mixed (type 3)
Experimental scenarios were developed on two levels (steps): the aesthetic level and the social level. On the aesthetic level, we presented our participants with three types of facial expressions of a robot: positive emotions (type 1 in Figure 2), negative emotions, (type 2 in Figure 2) and a mixture of positive and negative emotions (type 3 in Figure 2). Under type 1 and type 2 experiments, the participants do not undergo aesthetic experiences, whereas they undergo aesthetic experiences under type 3 where a robot rhythmically displayed facial expressions for playful acts.
On the social level, we examined whether there was a change in the social affinity or social attitude of our participants. For this purpose, the participants are asked whether they could remember the robot's name (Ray), which had been conveyed to them by the robot before testing on the aesthetic level.

Scenario of aesthetic imitative interaction between child and robot: aesthetic level (type 3) and social level
The test procedures are as follows: a child comes into a testing room with his/her caregiver. The child takes a seat opposite the robot. The robot displays one type of facial expressions to the child and instructs the child to imitate the expressions in order. The robot praises the child through positive facial expressions when the child imitates it successfully or reproaches the child through negative expressions when the child fails to imitate it.
Figure 3 shows a test scenario of aesthetic imitative interaction between a child and a robot, in which the aesthetic level (type 3) and social level are used together. At the end of test, we interview the child on whether the child enjoyed the mimic game as above.
In order to capture the aesthetic states of the participants, we observed the changes in the facial expressions of the participants through a video camera and we analysed the changes in their emotional states on the basis of their facial expressions by using Ekman's emotional model [28]. Ekman's model of facial expressions (happiness, sadness, anger, fear, surprise and disgust) was employed to see whether or not the imitative playing with a robot could aesthetically satisfy the participants. For simplicity, we divided the six facial expressions into two categories to decide the emotional state and aesthetic judgement of the participant (Table 1).
Categorization of facial expressions with respect to the emotional state and aesthetic judgment
Thus, we investigate that a child undergoes aesthetic experiences through imitative interactions with a robot because it performs playful acts with the robot and not because of the emotional states facially expressed by the robot.
3. Robot Design for Aesthetic Interaction
3.1. Perception and Robot Platform
To interact meaningfully with children, a robot must be able to perceive the world as children do; it should be capable of sensing and interpreting the same environments that children observe. Most human-oriented perception is typically acquired from face tracking/detection and speech recognition/generation. We employed the application programming interfaces (APIs) and class components supported by Java and Android 2.3 (Samsung Galaxy S-II OS version) SDK for implementing the perception functions. For face detection and tracking, we used JavaCV software, which supports a Java interface to the computer vision library OpenCV. OpenCV is based on the Viola-Jones method with Haar-like features; integral image representation and the AdaBoost learning algorithm are used to detect the children's faces [30].
In general, speech recognition involves two steps: signal processing (to transform an audio signal into feature vectors) followed by graph search (to match utterances to a vocabulary). Most current systems use Hidden Markov models to stochastically determine the most probable match. We implemented a speech recognizer with Google's speech recognition API using Android SDK [31], which shows high recognition accuracy for 230 billion words in English; Korean speech recognition support was provided in June 2010. The Samsung TTS (Text to Speech) engine was used for speech generation.
RQ-TITAN (Technological Innovation Towards Androids) developed by RoboBuilder. Co., Ltd. was used as the robot platform (Figure 4). It is a humanoid platform whose height (84-100 cm) is similar to the average height of 3–4-year-old children. RQ-TITAN has 15 actuators that move its head, chest, arms and legs, which have 2, 4, 2 and 14 degrees of freedom, respectively. The main controller consists of a Windows OS-based micro PC with an Intel 2.0 GHz Atom CPU and a 2 GB RAM.

RQ-TITAN: humanoid robot platform
3.2. Facial Expression
In general, the facial expressions of a robot are not life-like, owing to the limitations of mechatronic design and control. The transitions between expressions tend to be abrupt and rapid, and hence, unnatural. Instead of using mechanical actuation, we implemented our robot's face on a smartphone, Samsung Galaxy S-II, using 2D animation techniques (Figure 5) to facilitate natural transitions between expressions and to generate facial expressions with many degrees of freedom. Modern smartphones can serve as efficient computing platforms for robot faces because they can support various functions such as media players, digital cameras, GPS and high-speed data access, along with advanced third-party APIs for convenient integration with the smartphone's OS and hardware.

Robot face developed on smartphone
Eight emotional responses to consumer products in PrEmo
2D animated facial expressions were designed to express emotions based on PrEmo [29]. PrEmo is an instrument that reports emotional responses to consumer products through animations instead of words. It measures a set of 14 emotional responses (indignation, contempt, disgust, dissatisfaction, disappointment, amusement, desire, admiration, fascination, etc.), and each emotion is portrayed using an animated 2D cartoon character as listed in Table 2.
We selected six expressive cartoon animations for a child, which can clearly convey the emotional status while minimizing ambiguity, and we designed three positive, three negative and two neutral expressions (Table 3).
Three positive, three negative and two neutral facial expressions
Seamless transition between facial expressions was achieved via morphing, which is an animation technique for changing one image into another through interpolation. We used FantaMorph [32], a well-known commercial morphing application, to achieve professional-quality morph animation. Figure 6 shows the interpolation images created by morphing a neutral expression into a negative expression using FantaMorph.

Morphing neutral expressions into negative ones

Aesthetic interaction between a child and a robot
4. Experimental Results
To verify whether the proposed model can enhance the sociability of children, 10 children in the age group of 3 to 5 years were selected to participate in the study. Figure 7 shows an aesthetic interaction test between a child and the robot. Table 4 lists the demographic and experimental characteristics of the participants. There were five male and five female participants, with a mean age of 3.6 years (SD, 0.7). Eight children participated in the type 1 and type 2 experiments, whereas nine children participated in the type 3 experiment.
Demographic and experimental characteristics of 10 children
After conducting the type 1 and type 2 experiments, we observed the facial expressions of the participants, and accordingly, we analysed their emotional states on the basis of Ekman's emotional model [28]. Table 5 lists the overall results of the observed emotional states of the participants. Here, 0 denotes a neutral emotional state, +1 denotes a positive emotional state and −1 denotes a negative emotional state. Thus, it can be seen that in general, the children gave a positive response (+0.88; SD, 0.35) after the type 1 experiment and a negative response (−0.88; SD, 0.35) after the type 2 experiment.
We also observed the facial expressions of the participants after the type 3 (mixed) experiment. In addition, we asked the participants whether they could remember the robot's name and whether they had fun through the interaction. As indicated in Table 6, most of the participants gave a positive response after the type 3 experiment. Furthermore, all participants stated that they had fun.
Child's emotional response for type 1 and type 2 experiments (0: neutral, +1: positive response, −1: negative response)
Children's response for type 3 experiment (0: neutral, +1: positive response, +2: had fun, −1: negative response, −2: did not have fun, O: remember robot's name, X: do not remember)
Of particular note are the results for the robot's negative facial expression in the mixed version, for which the participants did not show negative responses; moreover, although the negative facial expressions are presented to the participants during the type 3 experiment, they all reported that the interaction was funny.
When asked whether they could remember the robot's name, Child 6 did not provide a response. The child's parent reported that the child was unable to hear the robot's name clearly owing to some technical errors that occurred when the robot uttered its name. After fixing this problem, we observed that most participants remembered the robot's name after playing with it.
Contingency table of child's emotional response for type 2 and type 3 experiments
To make this study more informative, we assessed the difference in emotional response using a statistical test. Fisher's exact test [33] was used to determine if there are non-random associations between two variables; we tested if any difference of observed responses is significant. From the contingency table in Table 7, we observe that the type 2 experiment has seven negative responses and none on positive, and the type 3 experiment has two on negative and seven on positive. The
From these results, it is observed that the emotional responses of the type 2 were quite distinct from those of the type 3, although the negative facial expressions are presented to the participants in the two experiments; the cognitive processes (aesthetic judgment involved in the aesthetic experience) had an influence on inducing the result responses (positive) in the type 3 experiment. Thus, the results highlight the effectiveness and potential utility of the proposed approach.
5. Discussion
This study corroborates the claim that a socio-aesthetic view can be applied to HRI research. Although several studies on social robots may report an increase in robot sociability through emotional imitative interactions between the robot and humans, our study was primarily motivated by aesthetic theories that mimetic experiences, such as theatrical performances, can promote social relationships.
Perhaps the most interesting aspect of our study is as follows: according to a Kantian aesthetic view, aesthetic experiences are based on human cognitions and emotions; cognitions and aesthetic judgment play a greater role in the aesthetic experiences than do emotions. If so, is this view also valid in the case of imitative interactions between humans and robots? Our intuitive answer to this question would be that aesthetic judgment (or aesthetic preference) has a more direct influence on the social affinity between a robot and a human, than do any other emotional factors. Thus, we demonstrate that aesthetic judgment is more influential than emotions in playful imitative interactions between humans and robots.
Relatively young children (3–5 years of age) were selected to participate in the experiments. They were divided into three groups. Two groups were required to respond to positive or negative emotions, while the rest were required to imitate the robot's facial expressions. The robot's facial expressions presented to the participants represented positive and negative emotions. The positive and negative facial expressions were presented rhythmically, in rotation. Thus, the participants in the first two groups responded to each positive and negative emotional facial expression of the robot with the right emotions; in other words, children who saw positive facial expressions gave positive emotional responses, and those who saw negative facial expressions gave negative responses. On the other hand, children who played interactively with the robot playfully imitated the robot's facial expressions and stated that they had fun. Thus, the children in the third group were interested in playing with the robot instead of simply responding to the different facial expressions of the robot. Therefore, considering the playful interaction with the robot as “fun” or “interesting” can be regarded as an aesthetic judgment under conventional aesthetics. This finding reinforces the claim of Kantian aesthetics that aesthetic judgment precedes emotion or that cognitive processes precede emotional processes (or accompany them) during aesthetic experiences.
In addition, this study presented the process underlying the relation between aesthetic experiences and social affinity in child-robot interactions. Perhaps whether one can remember another's name may be a basic factor in identifying the level of sociability among members of a group. Thus, we also tested whether the children could remember the robot's name after playing with it in order to confirm that aesthetic experiences enhance the social interest of the participants. Remarkably, almost all the participants were interested in the playful imitative interactions and remembered the robot's name. This finding supports our belief that there can be a repetitive and stable cognitive process of aesthetic imitative interactions between humans and robots. We have presented some tangible evidence in section 4, however, we were unable to provide a comprehensive explanation of the process because we could not employ a sufficient number of participants; moreover, the model of this process was not embedded elaborately into the robot system. Thus, further investigation is required in this regard.
Lastly, although the current results may seem remarkable, we must note that our study has certain limitations, as is the case with any work in a new research field. First, the children who participated in our experiments were relatively young, which has its advantages and disadvantages. They were interested in imitatively playing with the robot, but they could not verbally express their aesthetic state of mind with sufficient clarity. Further, a few of them misspelled the robot's name (“Ray”) as “Rao”, “Re”, etc., although they reported that they had fun while playing with the robot.
Obviously, it is necessary to develop more advanced tools for observing the participants in order to gain a better understanding of the relationship between aesthetic experiences and social affinity. Moreover, there is a problem of generalization. We have restricted our study to relatively young participants for effective playful interactions with the robot, however, for broader generalization, future studies must extract samples from a larger and more diverse population, including students from lower elementary grades.
Further, it should be noted that the appearance of the robot itself could interfere with the aesthetic judgment or emotional responses of the participants. Indeed, a few participants cried and avoided the robot before it presented its facial expressions; nonetheless, this had an insignificant effect on the results. We controlled only the facial expressions of the robot; in future studies, we plan to examine the effects of its appearance in detail.
6. Conclusion
We demonstrated that aesthetics influence HRI and that the relationship between aesthetic experiences and the imitative social learning of a robot involves a repetitive stable cognitive process. Moreover, we explained that in this relation, cognitive factors, such as aesthetics judgment, precede emotional factors. The results of the present study are commensurate with socio-aesthetics that emphasize aesthetic activities increase inter-subjective communication in society, and with the social objectives of HRI that interactions between human and robot are revalidated by sociologies. Finally, we expect these results to shed new light on the role of aesthetics in HRI.
