Abstract
1. Introduction
Research on virtual reality and games still needs a lot of improvements, especially on how to immerse users and provide them with attractive interactions. Interactivity and immersiveness are considered as the main goals to be achieved in virtual reality games (VRG), particularly for educational purposes, virtual training, virtual institutions, etc. Thus, the avatar is responsible for providing better interaction in virtual environments (Basori et al., 2009a, Basori et al., 2008b, Bogdanovych, 2007, Roussou, 2004, Yahaya, 2009). Why do virtual reality games need improvements? The reason is virtual reality application is not as natural as the actual world. Acosta (2001) mentioned that realistic virtual reality applications could “
Controlling the expressions of avatars is another issue facing those conducting research into facial animation. They can change the facial animation controller from items such as a mouse, a keyboard or a joystick to a camera tracker, a special glove or a brain-computer interface (Basori et al., 2011a).
Faller, J., et al (2010) have proposed a brain interface that can be used by disabled and non-disabled people alike. The brain signal used for their research is steady-state visual evoked potentials (SSVEPs) which provides a fast information transfer rate (Faller et al., 2010). Other researchers have focused on controlling avatars by using conversations across a chat device and camera tracking (Neviarouskaya et al., 2009, Zhan et al., 2007).
2. Background to the Research
Facial expression and walking are features needed for humanoid avatars or robots to interact with humans. Motion planning in robotics plays an important role in allowing robots to be able to move intelligently. Algorithms such as the hierarchical memetic algorithm (MA) have been used for motion planning in robotics (Lin et al., 2012). This paper will focus on walking behaviour based on emotional conditions rather than on motion planning.
The facial expression of emotion was initiated by Ekman (1982). A standard guideline for the emotional facial expression of humans called FACS (Facial Action Coding Systems) has been introduced in 1978 (Ekman and Friesen, 1978). Facial expressions can trigger facial animation improvement such as with the paramaterized facial model (Parke, 1971), facial rigging based on blend shape (Neuberger, 2010), facial rigging using clusters (Grubb, 2010) and facial rigging using interfaces (James, 2010). Fabri et al. (1999) showed that non-verbal communication in Collaborative Virtual Environments (CVEs) can be performed using face, gaze, gesture or even body posture. Until now, researchers are doing some expansion in terms of providing human likenesses to increase interaction and communication between computer and user (Fabri et al., 1999). Wang et.al. (2005) mentioned that there are two main problems to creating a virtual human. First, the construction of emotion and second the generation of the affection model which is purposely created to improve their presentation. The avatar does not only represent a human in terms of its physical representation; it also needs some context to make it believable. The current 3D humanoid models require improvement because they lack believability (Rojas et al., 2006). Rojas et.al. (2006) proposed an individualization method by giving the 3D humanoid model personality, emotion and gender. Zagalo and Torres (2008) suggested that the 3D humanoid model may turn into a character and be able to express their emotions by there being an act of touching between two characters. Melo and Paiva (2007) made some innovations in expressing the emotion of virtual characters by ignoring body parts. They used elements like shadow, light, composition and filter as tools for conveying the characters' emotion (Melo and Paiva, 2007). Other researchers also use lighting and colouring on avatar facial expressions in order to strengthen the impression of avatar emotions (Basori et al., 2010, Basori et al., 2011b, Basori et al., 2012).
With regards to humanoid avatars, researchers have reached the stage where they add emotion to the avatar. Here, a different approach for expressing avatar emotions is proposed. Table. 1 is a comparison of the avatar and facial expression controls reviewed in the existing works.
Summary of existing avatar and facial expression research
Based on the existing facial expression controls for avatars, it is found that there is still room for improving natural interface controls. The previous discussion clearly shows that facial expression control using brain computer interfaces has not been greatly explored. Of course, from the research point of view, this challenge portrays a new landscape to overcome and new models of natural interactions to propose.
3. Critical Analysis of Facial Expression Control
Functions in the facial rigging are responsible for controlling joints, blending shapes and clustering to manipulate the face surface of the 3D model. Functions can be written into the equation's format to manipulate control parameters and the expected effect on the face surface. Functions can be extended to a user interface to provide the user with an easier control for the facial region. When using GUI mode, each control value on joint angles, cluster transformations, blend shapes and functional expression has a particular key frame position or particular times. However, by using GUI on a particular desired area to create effects, it will be easy to control the facial expression of an avatar. The best example is shown in Figure.1.

Graphical user interface to control facial expression, Boris with Facial GUI 1.0.0 in Maya (James, 2010)
The facial expression coding system, which is proposed by Ekman (Ekman, 1982, Ekman, 2003, Ekman and Friesen, 1978) has six basic emotions, which are anger, joy, sadness, fear, disgust and surprise. These emotions are used as a basis for creating the emotionally expressive avatar. As a continuation of this research, in 1990, Faigin presented a popular argument that emotions are mainly determined by three meaningful regions, namely the eyebrows, eyes and mouth, which became the universal expression of the avatar, see Fig.2, (Faigin, 1990). The expression of anger drags the eyebrows so as to be close to each other and lower than their normal position. While for strong anger, a human will usually open their mouth or even shout (see the illustration in Fig.2.-A). Joy or happiness is an expression shown through a relaxing of the facial muscles, lips are widely opened and eyebrows seem calm (Fig.2.-B). Sadness makes eyebrows appear to stretch upwards and the mouth is closed but not so tight. The lower eyelids are pulled downwards to make crying eyes (Fig.2.-C).

Re-illustration of Universal Expression using Xface
The expression of fear makes the eyebrows pull upwards and close to each other. In addition, the eyes are widely opened but they are dragged to the upper part of the facial region. The lower lip receives more pressure than the upper lip (See Fig.2.-D). To show disgust, the eyebrows, eyelids and eyes are pulled together. The area near the nose is pulled and raised, while the mouth is half open and the other parts seem to be closed (See Fig.2.-E). For expressing surprise, the eyes are widely opened, while the eyebrows and eyelids are raised up and the mouth is open but in a relaxed position (See Fig.2.-F). Another sample of avatar facial expressions is shown in Fig.3.

Sample of Virtual human (3D Humanoid Model) in smiling mode
Figure 3 shows an emotional expression of happiness, with a wide and closed smile. Another aspect of the appearance is that the inner and outer eyebrows look more relaxed and the character is also shown walking with full confidence. The previous discussion has shown the facial animation techniques that are widely used in facial animation application. The interpolation technique is one of the famous techniques used in facial animation. Further, this technique is enhanced by facial rigging to provide the user with easy interaction control. This study uses blend shape interpolation of the facial region to perform emotional facial expression. The process of interpolation starts from a neutral expression named ‘base’ and then the base will start to change into a desired pose based on the interpolation value. Bee, N., Falk, B. & Andr, E. (2009) came up with an emotional facial expression control using an Xbox joystick. The user will be able to create particular facial expression by pressing a specific button on the joystick. Their method will help the user to interact with the avatar's facial expression. The approach has inspired this research to come up with another control for facial expression using brain activity and hand tracking.
4. Methodology
Emotions are also expressed by changing the lip shape. By referring to FACS, we have created certain facial expressions for an avatar such as anger and happiness; we will concentrate on these two kinds of emotions. According to theories of emotion, anger is usually related to something that makes people feel uncomfortable. The aforementioned facial expressions involve several action units but do not consider the intensity or level of each emotion because they are mainly concerned with producing a realistic imitation of an emotional expression (Arya et al., 2009, Villagrasa and Susin, 2009). The emotional intensity will be useful for the integration process: part of rendering the facial expression. On the other hand, Melo. C. and Paiva. A. (2007) have proposed using lighting effects to increase the emotional expression in their avatar. This study has come up with a different approach as a second contribution to this research. This approach is a combination of previous researched approaches. These are: reducing the Action Units that are involved in the rendering process, adding lighting effects based on colour value and connecting the facial expression control with external inputs such as mind controllers and gloves. According to previous researchers, controlling facial animation is considered as a contribution to facial animation research (Bee et al., 2009). The methodology of the system can be seen in Figure 4.

Methodology of system
Russell (1980) stated that ‘angry’ has a high Y value and a high Negative (−X) value (see Figure 5). In addition, researchers also study certain levels of colour, saturation and brightness that carry some emotional information like feelings of joy or sadness (Melo and Paiva, 2007).

Circumplex model of affect-emotion (Russell, 1980)
Muscle and alpha signal from the user are used to determine their emotional classification. This signal is classified using Circumplex theory. Happiness and anger are two emotions that are clearly detected in this experiment. Happiness is in the pleased axis and a little bit in the excited, while anger is high in excitation but unpleased. Based on these criteria, we turn the signal obtained from muscle and alpha signals into certain emotions. Furthermore, the other input like the 5DT glove is used to adjust the intensity of the emotion according to the finger tracking.
5. Simulation Results
The Nia mind controller has several sensors attached to the user's head and is able to record brain activity during the interaction. The mind controller recognizes and analyses the brain activity signal and it will produce a classifying signal based on the emotional condition. Furthermore, the glove produces a signal interpreted according to the hand gesture shape. In this system, the glove only acts as the intensity controller of emotional expression, e.g. if the user's emotion is recognized as anger, then the strength of the anger will be decided through the gesture shape. Based on previous work, this study also used the gesture posture, such as a rounded fist gesture to represent anger, which is proposed by Mubin et al. (2007). After having finished reading the input, the system will continue the process in the OGRE game engine to load the 3D model of a facial model, preparing the interaction management to communicate with an external input/output (I/O) API such as a glove API, a mind controller API, a sound API or a haptic API. Afterwards, the system will produce an output which is a facial animation with a natural interaction control using brain signals and finger tracking. The details of the process are described in Figure 6.

Sequence of Natural Interaction with avatar
The computation of facial expressions is based on a calculation of each Action unit as shown in Figure 7. Each action unit is responsible for the strength and type of the emotion expressed. In this case, the expression of anger is more complex than the expression of happiness. The complexity will increase accordingly as the level of anger rises (refer to Figure 7).
AU1 and AU2 are used to control expression near the eyebrows, while AU4 and level control the level of wrinkle and raise the area of the forehead. AU9 strengthens the expression of anger by creating wrinkles near the nose. AU15 and AU17 control the corner of the lip the appearance of the chin. The expression of happiness has different characteristics; there are four controls involved in the expression of happiness as an emotional condition (see Figure 8.).

Happiness calculation based on Action Units
AU1 and AU2 manage the eyebrow muscle and, together with AU4, perform happiness expression. Lip control and AU15 manage lip movement while emotion is being generated. All elements work together to perform the appearance of happiness while the level is used as a power control that determines the strength of the expression happiness to be rendered. Figure 9. shows happiness in the facial expressions of avatars at various levels of happiness.

Various happy expressions
Nia is one of the mind controller devices and it consists of several sensors such as:
Glance sensor showing the amplitude of eye movement signal.
Muscle sensor reflecting the muscle tension and level of excitation
Alpha 1, 2, 3 and Beta 1, 2, 3 show the level of activity inside the brain.
All these sensors correlate with excitation or level of activity happening inside the human body. For that reason, these sensors are suitable for recognizing the human emotion through the level of tension in the brain or the forehead facial muscle.
The sensors described in Figure 10. are able to recognize brain activity and muscle tension change during interaction between the user and the 3D avatar. A glance sensor will be not be used in this study due to the fact that eye muscle movement is not used as an input stimulation in this system. Ekman and C. Hager (1983) have proposed FACS that combines several Action Units (AUs) to recognize and produce emotion from facial expression. The FACS in Figure 11. show that muscle tension has a high correlation with emotional condition. Therefore, this system chooses muscle and one of alpha sensors as stimulation inputs. Alpha and Beta sensors use a similar method of measuring brain activity, which is why Alpha1 is used to represent brain activity. The mean value between the muscle and Alpha sensor will be used as a final input for the stimulation. The emotional recognition will depend on the average signal intensity of the muscle and Alpha sensors (refer to Equation 1).

Brainfinger according to the Nia Sensors

FACS regions
Note:
Es: Emotion Signal
Ms: Muscle Signal
α1: Brain Activity Signal
Equation 1 calculates the emotional signal from two sources: muscle and alpha signal. The emotional signal will determine what kind of emotional will be performed by the facial expression, haptic vibration and acoustic effects. The tension level of the muscle and Alpha signal can be divided into four main zones as shown in Fig.12, e.g. Z1, Z2, Z3, and Z4. Z1-Z4 is the intensity zone of each signal with which the level of excitation is classified, as shown in Fig.9. These four zones can be understood as low, medium lower, medium higher and high. Low and medium lower are associated with Z1 and Z2 with intention of capturing the feeling of relaxation, which is suitable for happiness. Nevertheless, Z3 and Z4 are medium higher and high zone which are associated with high levels of tension. This high level of tension will make the muscles appear stressed such as when anger occurs, which is why Z3 and Z4 are used to stimulate the expression of anger.

Zones level of Nia sensor
The facial expression is stimulated and affected by these signals classified into zones. For example, if the signal reaches Z3 or Z4, then the 3D humanoid will be in anger mode, and it will change the 3D humanoid's face into a state of anger and will also stimulate the magnitude of the force of vibration to represent anger. Otherwise, if it decreases to Z2 and Z3, then the stimulation will be changed to happiness which will change the 3D humanoid's emotional state to happiness mode with a happy facial expression. The magnitude of the force and acoustic effects will be adjusted as well according to the emotional state.
As mentioned in the previous section, inputs into this system come from a 5DT glove or a Nia mind controller, which act as a stimulator from the user. The user will be provided with a choice, whether they want to use the glove or the mind controller or both. However, the glove is only used as the intensity controller for emotional expression, e.g. if the emotion is anger, then it will create a rounded fist gesture that able to strengthen the intensity of anger. On the other hand, this intensity affect the magnitude force power as well.
Features for controlling the intensity of the avatar's emotion with a data glove are based on calculations of each sensor position. The fingers' movement will be read by a sensor which will then send a signal containing data pertaining to the position of each finger. The finger position value will be used to calculate the intensity of emotion by comparing this data with the maximum value from the finger sensor. Only two emotions are involved in the stimulation process: anger and happiness. That is why, to use these intensity values, a threshold needs to be defined in order to classify the intensity. This threshold is divided into two: the anger threshold and the happiness threshold. The anger threshold is defined as the minimum value for the finger position for it to be considered as having the anger shape (refer to Figure 13.).

Hand shape for anger threshold (minimum finger shape for anger)
Figure 14 is a neutral position where the value for each finger element is equivalent to 255 (the maximum value for the finger position). Based on the finger tracking using a 5DT glove, it is found that the threshold for anger and happiness has a similar value. If the finger as shown in Fig.14. has a maximum value=255, it can be assumed that the finger value in Fig.13. is half that of the maximum value as shown in Table 2. On the other hand, the happiness mode is a bit different because all the fingers except the thumbs need to be closed tightly, while the thumb stays at a middle position between fully closed or open (flat position). Consequently, the intensity is only calculated from the position of the thumb.
Threshold for Anger and Happiness

Flat hand shape for neutral
The threshold for anger is 128, and it will decrease until zero after which the hand shape reaches a fully closed tight hand shape as shown in Figure .15.

Fully closed hand shape for the strongest anger.
On the other hand, the happiness threshold starts from 128 for the thumb and zero for the other finger values then the thumb value can increase up to 255 to perform the strongest happiness as shown in Figure 16.

Thumb open and all finger closed for strongest happiness.
The value of the finger position determines the strength of the emotion performed by the facial expression. Figure 17. is pseudocode that describes the details of the finger sensor reading process using function fdgetSensorScaled.

Pseudocode for finger position tracking
Each finger is captured by two driver sensor indices that are different to one another. The smallest number starts from the thumb and the number for the driver sensor index rises to 13. Numbers 16 and 17 are correlated with pitch and roll capture. In Table 3., glove sensors are exposed and almost all sensors are correlated with the intensity control of emotion except pitch and roll. ‘Flexed’ is a condition of the finger in an open position and ‘unflexed’ is a condition where the finger is in a closed position as shown in the previous example (Figure 13–14).
5DT glove sensors (Technologies, 2000)
Note: Asterisk (*) signifies the same value of sensor as when 5DT data glove is used.
The data from finger tracking as discussed before ranges from 0 to 255. The value of the finger sensor can vary, as shown in Table 4.
Data sample for finger tracking
The facial appearance of the avatar will change according to finger position movements. Refer to Fig.18.–22 for the illustration of interpreting emotion with the mind controller and controlling intensity of emotion using a glove. The mind controller will record the tension level of the user's facial muscle and brain activity. If the tension reaches an anger zone (Z3 or Z4, refer to Figure 9.) then the emotion will be recognized as anger. The glove is designed to capture finger shape, e.g. if the user is trying to make a “fist” in the glove, it will be interpreted as “intensity for anger,” then the intensity of the anger will change according to this value. On the other hand, if the muscle or Alpha signal is drops to Z1 or Z2 then the emotion will be interpreted as happy.

Happy expression through style of walking

Controlling expression and mode of walking, indicating happiness, through mind controller and hand glove

Anger expression through style of walking

Controlling expression and mode of walking, indicating anger, through mind controller and hand glove

Happy Walking Transition
Figs. 21 and 22 are expressions of anger and the humanoid avatar walking, which are controlled by the brain signal and hand gestures. The transition of emotion walking is shown in Figs 23. and 24.

Angry Walking Transition
Fig.19 illustrates how the user is trying to show the emotion “happiness” by smiling and controlling the intensity of emotion by raising the thumb finger. The positions of the finger that can be considered are in two main forms i.e. full fist form for anger emotion and a ‘thumbs-up’ position.
6. Conclusion
The feedback from users is very exciting with 67% of users giving a strong and positive response to the system. The utilization of the brain interface and glove is believed to give a strong impression and believability to users in the real world and even strengthen the interactivity and immersiveness of a virtual reality or a robotic application itself. This may be because natural interaction is more attractive and more interesting for most users of games or virtual realities. This work has wide scope for future development, especially if it is used to express another emotion such as sadness, disgust, surprise or an even extreme expression. There are other signals, which have not been used in this study experiment such as beta, Mu, Theta and Delta. The future of detailed emotion recognition will be handled in a further study along with the growth of an emotion recognition process.
