Abstract
Keywords
1. Introduction
Important findings in brain and memory research in the last few decades have emphasized the potential health benefits of engaging in both cognitively and socially stimulating activities. For example, both social and cognitive stimuli have been found to promote the psychological well-being of older adults [1] and minimize the risk of social isolation which can negatively impact an elderly individual's health, for example, through increased risk of dementia [2] and higher likelihoods of having coronary heart conditions [3]. In addition, studies have also shown that cognitive activity throughout one's lifetime, including the early and middle life stages, can help reduce the risk of late-life cognitive decline [4] and is related to a person's semantic memory, perceptual speed and visuospatial ability [5].
Our work focuses on providing needed insight into the use of innovative robotic technologies for person-centred cognitive interventions. Namely, our objective is to develop intelligent, socially assistive robots that can provide cognitive stimulation during social human-robot interactions (HRI) [6-9]. In the future, the robots can be used as aids in providing cognitive training and social interaction in both health-related and education fields, for example, to: 1) older adults including those suffering from cognitive impairments, 2) children and adults with attention deficit hyperactivity disorder, brain injuries or learning disabilities, and 3) individuals with major depression disorders which effect cognitive functioning.
To date, only a handful of research groups (including our own group) have focused on developing life-like socially assistive robots to engage different individuals in varying socially or cognitively stimulating activities [10-18]. For example, the seal-like robot Paro, [10], has been designed to engage elderly persons, including those with dementia, in animal therapy scenarios by learning which of the robot's behaviours (i.e., moving its body parts and making seal sounds) are desired by the way a person pets, holds or speaks to it. KASPAR, a child-sized tele-operated humanoid robot engages an autistic child in play imitation games by displaying various facial expressions, waving its hand, and drumming on a tambourine [13].
Our own recent work in this area has focused on the development of the intelligent human-like robot, Brian 2.0, Figure 1, [16-18]. Brian 2.0 is being designed as a therapeutic tool to engage people in personalized cognitively stimulating activities, providing them with an avenue to interact and socialize during the course of the activities. The significance of using a human-like social robot lies in the ability to directly incorporate a person's natural communication capabilities as well as his/her ability to understand these forms of communication.

The socially assistive robot Brian 2.0
In this paper, we present the design of a novel learning-based HRI control architecture for Brian 2.0 which will enable the robot to effectively engage an individual in one-on-one person-centred cognitively stimulating activities. In particular, the architecture allows Brian 2.0 to be a social motivator by providing assistance, encouragement and celebration during the course of an activity. A hierarchical reinforcement learning (HRL) approach is used in the architecture to provide the robot with the ability to: (i) learn appropriate assistive behaviours based on the structure of the activity, and (ii) personalize an interaction based on a person's user state as defined by a combination of affective arousal and activity performance. The architecture uniquely focuses on bidirectional emotion-based interactions between an individual and the robot in order to promote the cognitive and social well-being of a person.
The novelty of our proposed control architecture lies in the inclusion of: 1) a user state recognition and analysis module to allow the robot to be able to identify a person's user state during the course of an activity, 2) a robot emotional state module which provides the robot with emotional states that are consistent with its contextual assistive interactions and that will aim to elicit an appropriate response from the user while also responding appropriately to a person's user state, and 3) the first use of an on-line learning MAXQ, [19], HRL technique to provide the robot with the ability to adapt to new people and learn appropriate assistive behaviours in order to engage in personalized one-on-one HRI.
2. Learning Strategies for Socially Intelligent Robots
It is envisioned that robots will need to have social intelligence in order to be effectively integrated into human society. Social intelligence allows a robot to share information with, relate to, and interact with humans. HRI research involves empowering a robot with the social functionalities needed to engage human participants in different types of interactions. A number of these characteristics will need to be formulated via the study and development of social learning capabilities for robots.
Recently, a number of socially intelligent robots have been developed that are capable of learning their behaviours for social HRI scenarios. A common approach has been to utilize reinforcement learning (RL) strategies to solve HRI control problems that are modelled as either a Markov decision process (MDP) [20-23] or partially observable Markov decision process (POMDP) [24], where the latter deals with noise and state uncertainty. Other approaches have focused on utilizing policy gradient reinforcement learning (PGRL) when there is no obvious notion of state, i.e., [12],[25].
The main limitation of RL approaches currently used in HRI applications is their scalability. RL algorithms treat the entire state space as one large search space and hence, the search space can grow exponentially as the size of the state space increases, increasing the complexity of the algorithm. This results in RL algorithms having slow learning rates and becoming intractable in large state spaces. Therefore, these methods can only be physically applied to small-scale real-world systems.
Hierarchal reinforcement learning (HRL) methods have also been proposed for HRI scenarios. In the case of HRL, the decision making problem is decomposed into a collection of smaller sub-problems so that they can be solved more efficiently [19]. This results in faster learning as the value function requires less data to be learned. For example, in [26], a hierarchical POMDP approach was implemented in the dialogue-based guidance task of the Pearl robot in order for the robot to perform tasks such as reminding a person of an appointment, navigation and/or information assistance. The control policy was computed off-line, hence, during task execution, the controller simply looked up the appropriate robot action to be implemented. No on-line training was implemented.
Our own initial work in this area has focused on using
In our present work, we propose the use of a MAXQ HRL approach, within a multi-layer control architecture, to enable the socially assistive robot Brian 2.0 to provide assistance and encouragement to individuals as they engage in a cognitively stimulating activity. Brian 2.0 is capable of encouraging natural interactions between an individual and itself through social learning and its physically expressive capabilities. The objective of the HRI controller is to determine an individual's user states during a cognitively stimulating interaction with Brian 2.0 and, in turn, determine the appropriate behaviour of the robot to reflect the task to be completed given a particular user state. A modular design approach is applied to the overall control architecture, allowing for the addition and/or substitution of different sensor modalities as needed based on the intended activity.
By using a MAXQ approach, the robot's overall assistive task can be reduced into smaller more manageable sub-tasks that it can learn concurrently. This makes our MAXQ approach scalable, allowing us to effectively expand our architecture to include more activities and robot behaviours. Since MAXQ also reduces memory requirements, we can also have the robot interact with a larger number of users. Furthermore, as our objective is to improve/maintain positive user states during the course of a cognitively stimulating activity with the robot, we perform
3. Social HRI Scenario
Our goal is to design a robotic social motivator to provide interventions that focus on maintaining and strengthening the cognitive abilities of a person, while promoting engagement in a cognitively stimulating leisure activity. We have used two criteria identified in the literature to design the cognitive intervention that Brian 2.0 can provide to individuals in order to better engage them in an activity of interest and increase positive affect: 1) we focus on matching the stimuli that is provided by the robot to a person's skill and interest level, and 2) the robot is designed to provide one-on-one social stimuli.
In this work, we have chosen the card game of memory as our cognitively stimulating activity. The game consists of 16 picture cards turned face down in a 4×4 grid formation. The objective is for the human player to flip over pairs of cards and match the pictures on the cards correctly. Once a pair has been matched, the two cards are removed from the game. The game is over when all cards have been matched. Individuals play the game as single players while the robot autonomously provides preferred amounts of social stimulation in order to keep these individuals engaged in the game. The memory functions within the brain that are trained while playing this card game include the visual object memory and the updating function of the central executive component of the working memory [27].
We aim to keep a person stimulated and engaged in the memory game activity. In order to do this, herein, we focus on reducing activity-induced stress of the person. Activity-induced stress is known to result in negative moods and lead to disturbances in motivation (e.g., loss of task interest) and cognition (e.g., worry) [28].
4. HRI Control Architecture
A generic modular learning-based HRI control architecture is proposed to allow the robot to provide encouragement and assistance to a person as he/she engages in a cognitively stimulating activity. The HRI control architecture focuses on determining the person's user state and his/her task performance during a cognitively stimulating interaction with Brian 2.0, and adjusting the behaviour of the robot to reflect the task to be completed given a particular user state. A modular design approach is utilized in the control architecture to allow for the addition and/or substitution of different sensor modalities as inputs into the user and activity state modules as needed based on the intended activity. Due to its generality, the architecture can be applied to different individuals or a combination of person-centred guidance-based activities in HRI scenarios.
In our proposed HRI control architecture, we apply a hybrid approach to resolve uncertainty at both the sensor data processing level and at the decision making level. At the sensor data processing level, we utilize sensor-specific algorithms to obtain the best possible state representation prior to the decision making process. These algorithms directly deal with uncertainties and noise acquired from raw sensor readings, resulting in a more accurate representation of the state of the interaction. At the decision making level, we have incorporated a knowledge clarification layer which uses clarification dialogue between the person and robot in order to reduce errors as a result of speech recognition. Furthermore, non-deterministic human behaviours are accounted for by the MAXQ algorithm. On-line training is also utilized to adapt to non-deterministic scenarios, as well as new users.
An overview of our proposed HRI control architecture is presented in Figure 2. For the current implementation of the control architecture to the memory card game scenario, sensory information is acquired for: (i) recognizing human verbal actions via a Logitech noise-cancelling microphone, (ii) user state recognition via an emWave ear-clip heart rate sensor, and (iii) activity state monitoring using a Logitech webcam. The heart rate sensor is utilized to determine a person's affective arousal level during activity engagement.

HRI control architecture
4.1 Activity State Module
The Activity State module monitors the state of the memory game during the interaction utilizing images provided by the 2D webcam. A feature recognition and clustering technique we have previously developed based on SIFT (Scale-Invariant Feature Transform) is used to determine the number, identity and location of the cards within the activity area [16]. Pairs of picture cards utilized in the memory game have unique SIFT keypoints, allowing them to be distinguishable from each other. The clustering technique utilizes a nearest neighbour search algorithm to define regions in the 2D images containing keypoints that may potentially represent cards that have been flipped over during the game. A database of the keypoints for each picture card is utilized to determine the identity of the flipped over cards. Card recovery errors can arise during activity state recognition when cards become obstructed. This mainly occurs due to the temporary presence of human hands. Uncertainty is minimized by capturing and analysing
4.2 Speech Recognition and Analysis Module
Human speech is recognized via the Speech Recognition and Analysis module. Recognition is performed by Julius, a two-pass large vocabulary continuous speech recognition (LVCSR) decoder [29]. Words are recognized based on their phonemes and their approximate location in an utterance. The LVCSR software has been customized to support the vocabulary, dialogue and action-based context needed during game playing. In particular, the vocabulary and grammar definitions have been configured with the syntactic constraints of a response or question posed to Brian 2.0. Herein, we have utilized the person independent VoXForge acoustic model [30], which is composed of statistical representations, created via Hidden Markov Models, for each phoneme in the English language to account for persons with different accents and speaking styles. The acoustic model has been trained using 625 unique voices.
The reliability of the spoken utterance is determined using word confidence scores provided by Julius which are based on a combination of predicator features (e.g., acoustic and language model scores). We then determine the weighted average of all the confidence scores of the recognized utterance. If the weighted average is low or if there are multiple results with similar weighted averages, this information is sent to the Knowledge Clarification layer in order to resolve the uncertainty via the robot asking clarification questions.
4.3 User State Module
The User State module is used to determine a person's task-based user state during game playing. This is determined during the proposed activity using a combination of affective arousal and activity performance. Affective arousal is the intensity with which emotional stimuli are perceived [31]. Heart rate has a long history of being used as an index of arousal [32]. Heart rate data is gathered from the user during interaction at the sampling rate of 2Hz provided by the sensor. The baseline heart rate, which is an average of 10 valid data points, is acquired before the start of the activity. Subsequent valid heart rate readings are compared to this baseline, with a threshold of 5bpm, to determine if the person is in a high or low affective arousal state. Activity performance is determined by whether or not matching card pairs were found in the previous round of the memory game by the Activity State module, Table 1. The 5bpm heart rate threshold, as well as the user states in Table 1, have been developed through the monitoring of numerous experiments. In these experiments, we were able to detect increased heart rate when a person was faced with both a stressful and exciting situation in an activity. In the context of the memory game, stress was directly related to the scenario when a matching card pair could not be found and excitement was directly related to matching a pair of cards.
Task-based User States
4.4 Robot Emotional State Module
The Robot Emotional State module uses the person's user state and the current assistive action of the robot to determine the emotional state of the robot. The objective of the emotional state module is to determine which robot emotion will elicit an appropriate response from the human in order to accomplish a given task while also responding appropriately to a person's user state. We utilize a finite-state machine approach to match the appropriate robot emotion to a given user state and the robot's assistive action within the context of the cognitively stimulating activity. For the memory game, the robot emotions are: happy, neutral and sad. For example, when the person finds a matching card pair and is in an excited state, the robot celebrates with him/her by being in a happy state. The robot is sad when it has to repeat an instruction after a long period of waiting. Sad is chosen for re-engagement based on human response to this emotion as outlined in empathy theories (wanting to help a person that is sad in order to relieve him/her of this emotion), as well as the wanting to achieve an internal goal based on self-rewarding – feeling good about ourselves by helping others. In general, in all cases when the user is stressed, regardless of the robot action to be implemented, the robot will try to improve the user state of a person by being in a happy state. For all other cases not mentioned here, the robot's emotional state is neutral.
4.5 Behaviour Deliberation Module
The Behaviour Deliberation module is the main decision making module within the HRI control architecture. This module requires inputs from all four of the aforementioned modules in order to determine the robot's effective assistive behaviour via a MAXQ hierarchical reinforcement learning approach [19]. The overall behaviour of the robot is physically implemented by the actuator control module using a combination of both verbal and non-verbal forms of communication. In the next section, we will discuss the detailed design of the Behaviour Deliberation module as it pertains to the robot engaging a person in the card game of memory.
5. The Behaviour Deliberation Module of Brian 2.0
The Behaviour Deliberation module is composed of two layers: (i) Knowledge Clarification and (ii) Intelligence.
5.1 Knowledge Clarification Layer
This particular layer is in charge of generating a clarification dialogue between a person and the robot in order to reduce errors as a result of speech recognition. Namely, if the average confidence score for the utterance by the person is low, as determined by the Speech Recognition and Analysis module, the robot will state the utterance that has the highest relative confidence score and ask the person to confirm his/her request by providing positive/negative feedback in the form of yes or no answers. In the case of multiple recognition results with similar confidence score averages, the robot will individually clarify the top three results to determine if the user is asking to recall, identify or localize a card. This allows the robot to match the utterance with its own stored activity-specific utterance templates and hence, increase the accuracy of speech recognition.
5.2 Intelligence Layer
The intelligence layer consists of the MAXQ HRL algorithm, which is capable of adapting the robot's behaviour to the current assistive interactive scenario. MAXQ is utilized to determine the overall behaviour of the robot as a function of both verbal (speech) and non-verbal (gestures, and facial expressions and intonation based on the robot's emotions) communication means.
5.2.1 The MAXQ Learning Algorithm
MAXQ provides a hierarchical decomposition of a given reinforcement learning problem (task) into a set of sub-problems (sub-tasks). With respect to the memory game, the overall assistive task aligns with the objective of the card game: to identify and check that cards flipped over result in a corresponding pair match. MAXQ is able to support temporal abstraction, state abstraction and sub-task abstraction which are important in the decision making process for the socially assistive robot in the memory game scenario. The need for temporal abstraction exists since, depending on the player's skill level and style of play, some actions may take varying amounts of time to execute. State abstraction is beneficial since all state variables are not needed for certain tasks. For example, when instructing the player to flip back unmatched cards in the game, the identity of the cards are irrelevant and should not affect the robot's behaviour. Due to state abstraction, the overall value function for this task can be represented more effectively by utilizing only a subset of the state variables, reducing memory requirements. Sub-task abstraction is also necessary because it allows sub-tasks to be learned only once; the solution can then be shared by other sub-tasks.
This paper presents the first application of the MAXQ algorithm to multi-modal interactions with socially assistive robots. We propose a new two-stage training process for our learning strategy which includes both off-line and on-line learning using real user data (discussed in Sections 6 and 7) in order to allow the robot to personalize its interactions with different individuals.
5.2.2 Task and Value Function Decomposition
At the core of MAXQ is the value function decomposition, which describes how to decompose the overall value function (i.e.,

Hierarchical task graph for the memory game scenario (primitive robot actions on bottom row are defined in Table 2)
Examples of Primitive Robot Actions
5.2.3 State and Action Definitions
A set of states,
Table 2 and Figure 4 show examples of primitive robot actions. The primitive actions for the sub-tasks

Example robot behaviours: (a) providing celebration in a happy emotional state after a correct match, (b) providing instruction in a sad state when game disengagement occurs and (c) providing help in a neutral state.
Every sub-task in the task graph has a termination condition. For example, for the
At the start and end of the game, the Deliberation module implements the following behavioural actions for the robot: 1) at game start: “Hi, my name is Brian. I am glad you want to play the memory game with me. Let's start.”, and 2) at game end: “Congratulations, you have completed the memory game.”
6. MAXQ Training
We have implemented a two-stage training procedure for our MAXQ approach. In the 1st stage, we focus on determining appropriate behaviours for the robot based on the structure of the game. After the robot has learned its optimal behaviours with respect to the card game, the 2nd training stage focuses on developing personalized interactions for each person using his/her user state during game playing. Here, we discuss the 1st training stage. The 2nd training stage is detailed in Section 7.
The objective of the 1st training stage is to learn the robot's optimal behaviours based on human actions and activity states. On-line training would be unrealistic to use at this stage due to the large amount of possible states and actions that need to be explored, as well as the extensive amount of experience required to learn the optimal strategy. Therefore, we utilize an off-line training procedure that incorporates a human user simulation model, error models for both speech recognition and activity state detection, and an epsilon-decreasing exploration strategy that can provide the extensive interaction experience needed for policy learning.
6.1 Human User Simulation Model
A simple probabilistic approach for user modelling is the
Wizard-of-Oz (WOz) experiments consisting of ten participants, each playing the memory game while interacting with the robot, were performed to acquire the necessary data for the bi-gram model. In these WOz experiments, a member of our research team sat in a different location and only controlled the decisions regarding the behaviours of the robot (i.e., behaviour deliberation module), all other modules of the control architecture were autonomous. To promote natural interactions, we did not tell the participants how to behave, we merely requested that they play the memory game. In this bi-gram user model approach, a person's action is dependent on the last robot action, i.e.,
Bi-gram User Simulation Model
If there are 0 cards initially flipped over, this action is described as flipping over two cards at once.
6.2 Speech Recognition Error Model
To account for variations in recognition performance caused by noise and speaker-dependent differences, we use a speech recognition error model that assumes a new speaker for every game. For each recognition task (RT), we use the following equation to compute the recognition rate (RR) [22]:
The recognition results of ten different speakers are used to compute the overall RR and standard deviation for
Speech Recognition Rates for Recognition Tasks
6.3 Error Modelling for Activity State Detection
The activity state detection error is based on determining: 1) the identity of the cards in the game, and 2) the number of cards flipped over by the user. The card identification error is incorporated into the simulation model for when the robot must provide help to the user. The game area is split into a 4×4 grid, representing the location of the cards. Table 5 shows the detection rates for each section based on the results of ten detection trials per section.
Card Identity Detection Rates
Errors resulting from detecting an incorrect number of cards flipped over are also incorporated in our simulation model for when the robot must provide the appropriate instructions based on the activity state. Table 6 shows the detection rates for when 0, 1 or 2 cards are flipped over.
Detection Rates for the Number of Cards Flipped Over
6.4 Rewards
The aim of our reward system is to minimize the cost of the actions taken to reach the ultimate goal of completing the game. In the memory game, a desired action is defined as an appropriate action for the current state (e.g., the robot congratulating a person when he/she has found matching cards). Every completed primitive action is given a negative reward of −1, whereas undesired actions are given an additional negative reward of −20. Desired primitive actions are not further rewarded. A positive reward of +21 is given at 1st level sub-tasks if a person is asking a help-related question and the appropriate
6.5 Exploration Policy
An epsilon-decreasing exploration strategy is applied during off-line training. At the beginning, ε is set to 1 for the Root Task, and 1st and 2nd level sub-tasks to encourage the maximum amount of exploration possible. 3rd level sub-tasks, which only evoke primitive actions employ a greedy policy (i.e., ε = 0), where the action with the highest potential reward (
6.6 Performance Analysis
We have performed a study to compare the rate of convergence of our MAXQ approach versus a traditional flat

Comparison of MAXQ and flat Q-learning for the memory game
7. Social HRI Experiments
Once the 1st training stage has determined the robot's appropriate behaviours based on the structure of the memory game, the 2nd training stage is implemented. This on-line training stage is used to allow Brian 2.0 to learn its optimal assistive behaviours based on a person's user states during game engagement. The aim is to select the robot's behaviours in an attempt to maintain positive (i.e., pleased or excited) user states during game playing. We postulate that this will, in turn, allow a person to be more engaged in the cognitively stimulating activity.
7.1 Procedure
The on-line training procedure was tested on ten healthy adults (ranging in age from 20 to 35) as they played the memory game twice while interacting with the robot. A baseline heart rate was obtained for each participant prior to game initiation. A successful action is defined as a robot action that improves a person's user state from a stressed state to a non-stressed state.
In this experiment, a scenario involving activity-induced stress is simulated in order to demonstrate that, in such situations, the robot can be effectively used to minimize this type of stress. As our participants are healthy adults, we have imposed the following constraint on the game: each participant must try to win the game with five or less incorrect matches. This system performance experiment will allow us to verify the controller's ability to detect a user state and adapt the robot's behaviour accordingly based on this user state. Furthermore, these healthy adults can provide detailed comments on their experience and the performance of the robot via post-experiment surveys and self-studies. This experiment will provide us with valuable feedback on the functionality of the controller in order to optimize its design prior to conducting long-term cognitive training interventions with other potential end-users.
We have developed a novel on-line training procedure utilizing a person's user state to explore robot behaviours such as providing instruction or help when appropriate, and rewarding the behaviours that succeed at improving user state during the memory game. Exploration of behaviours is triggered by the robot detecting that the person is in a stressed state. At this user state, the exploration policy, ε, is non-greedy for the
At the end of the 1st stage of training, the
7.2 Results and Discussions
Preliminary experiments demonstrate that the proposed on-line training procedure allows the robot to learn its optimal assistive behaviours during personalized interactions. Namely, the robot successfully detects user states at every interaction, explores different behaviours, and is rewarded when its behaviours improve user states.
On average, the participants played the two games for approximately 40 minutes. Figure 6 shows the user states of all ten participants during the two games. One interaction is defined to include a robot detecting a user's action (which updates the activity state), as well as the robot's reaction during game playing. From Figure 6, we can see that the participants had unique user state responses. Some participants such as A, C, D, E, F, I felt more stressed at the beginning and/or middle of the overall game playing session and had higher user states near the end of the session. Other participants felt stressed throughout the game sessions, such as G, H and J, while participant B felt more stressed during the latter half of the game session.

Participant user states detected during the memory game.
Figure 7 provides a more detailed view of two sets of ten different interactions for each of the participants, i.e., one for each game. The robot was able to explore and determine appropriate behaviours during game playing utilizing the proposed MAXQ control architecture and on-line training procedure based on the participants' user states and activity states. For example, for Participant A, the robot was able to detect that the person was in a stressed state at interactions 8, 12 and 35, and provided assistance via the

Interaction details for all participants.
Figure 8 shows the rewards for the

Rewards for the

Rewards for the
A post-experiment assessment was administered after the HRI scenario which included a self-study to analyse the performance of the robot's ability to detect change in user state throughout the activity and a questionnaire to obtain feedback on the robot's behaviour during game playing. For the self-study, each participant was asked to identify when he/she felt stressed (negative high arousal) or excited (positive high arousal) during the course of the activity via playback video. We compared the self-study results as well as activity performance to the user states detected by the robot in order to determine the average user state prediction accuracies for the participants. We found that the average state prediction accuracies were 82.8% for excited, 81.2% for pleased, 76.7% for neutral and 80.3% for stressed. From these results, we can see that high recognition accuracies were achieved when detecting the participants' change in user state.
For the questionnaire, the participants were asked to choose their responses from a list of robot behaviours. The participants were first asked to identify the robot behaviours they felt were the most effective at relieving stress during game playing. Table 7 summarizes their responses based on a ranking of the total number of responses for each behaviour. The robot providing instructions was ranked the highest by the participants, which concurs with the rewards presented in Figure 8. Participants A and D mentioned that both the robot's instructions and help were effective at relieving stress. Even though the rewards for instruction did not increase for these two participants, the rewards for help did. The four participants (A, D, F and H) that stated that the robot's help behaviour was one of the most effective behaviours at relieving stress, also had increased rewards for the help sub-task during the interactions.
8. Conclusions
In this paper, we present the design of a novel modular learning-based control architecture for our socially assistive robot Brian 2.0, enabling the robot to be a social motivator by providing assistance, encouragement and celebration during the course of a cognitively stimulating activity. Namely, the control architecture utilizes a MAXQ hierarchical reinforcement learning approach in order for the robot to learn its own appropriate assistive behaviours based on the structure of an activity and further personalize the interaction based on a person's user state, where the latter is defined as a combination of affective arousal and activity performance. Results from off-line and on-line training validate the performance of the learning algorithm with respect to the robot's ability to learn its appropriate assistive behaviours to maintain positive user state during a memory card game. Our future work consists of designing a pilot study with the robot at our collaborative long-term care facility with elderly persons with mild cognitive impairment to observe the robot's ability, using the proposed controller, to be a social motivator and engage individuals in the memory game, as well as to study long-term human-robot relationships between Brian 2.0 and a user.
