Abstract
Keywords
1. Introduction
In this paper we present a programming language and environment for the specification, representation and interpretation of service robot tasks. Programming service robots is a complex exercise involving several kinds of programs defined in at least three, mostly orthogonal, dimensions: the first consists of the algorithms for programming the robot's basic perception and action behaviours (e.g., vision, speech recognition and interpretation, navigation); the second involves system programming which deals with processes and agents, process communication and coordination, and also with the drivers for the diverse input and output devices; finally, the third dimension addresses the representation and programming of the service robot task structure. This is known in the literature as behaviour engineering. Thanks to recent advances in computing, algorithms and sensor and mechatronic technologies, current robotic systems are often equipped with a wide range of functionalities such as navigation, grasping and object recognition. As a result, robots are now capable of performing complex tasks that involve many of these functionalities working concurrently. In order for a robot to perform tasks autonomously, a way to specify its behaviour is needed, i.e., an action selection mechanism which decides what to do based on what the robot perceives and the model of the task. This is especially true for service robots which operate in highly dynamic and unstructured environments. Several strategies have been proposed for behaviour engineering, ranging from using general programming frameworks and conventional languages to developing domain-specific languages. However, the challenge of how to specify behaviour in a flexible and concise way still remains. In this paper we are concerned with specifying and programming service robots in this latter dimension.
In Section 2 we overview several approaches for representing and programming task structure, with focus on service robot tasks, and place the present approach in such a context.
In Section 3 we present our approach for representing the structure of service robot tasks. Tasks are conceptualized in terms of generic protocols, which we call dialogue models (DMs), which the robot needs to perform in order to achieve its goals. The underlying computational model of the present approach is the functional recursive transition network (F-RTN) [1], an extension of the recursive transition network (RTN) [2]. RTNs have an expressive power equivalent to a push-down automata and the model goes beyond the finite state automata (FSA) or finite state machine (FSM) model (i.e., in the formal sense in which a FSA corresponds to a regular language) commonly used in service robot task programming, but preserves the graph-oriented structure, providing a very good compromise between expressive power and efficient computation. The programming environment is embedded within the robot's architecture and operating system through a simple interface, permitting fast prototyping and development. In addition, the system provides a simple and flexible interface to knowledge bases and deliberative resources, like planners, theorem provers and problem solvers which can be used on demand during task execution.
The specification and interpretation of SitLog's programs are presented and illustrated in Section 4. These follow closely the Prolog notation and the core of the system consists of two interpreter programs working in tandem: the first implements the F-RTN model (i.e., for traversing the recursive graph), and the second interprets expressions of a functional language called
DMs are represented independently of perception and action modalities, and SitLog's programs need to be related to the actual interpretations and actions performed by the robot during the execution of a task. For this, the SitLog interpreter is embedded in the robot's architecture. In Section 5 we describe the interaction-oriented cognitive architecture (IOCA) [1] that we have developed in conjunction with SitLog for this purpose.
In Section 6 the implementation of the present formalism and programming environment in the robot Golem-II+ is presented. For validation purposes we have developed the full set of tasks of the RoboCup@Home competition (Rulebook 2013) with very promising results. SitLog is independent of task and domain, and can be embedded in different architectures and operating systems, and the paper is concluded in Section 7 with a brief reflection on the generality of the present formalism.
2. Robotic Programming Languages
In this section we review a variety of strategies for specifying and programming the structure of robotic tasks. Among the first domain-specific languages designed for behaviour engineering was the Behaviour Language [3], built on an extension of the subsumption architecture [4]. In the Behaviour Language, behaviours are defined as collections of rules written in a subset of Lisp. From these rules, behaviours are compiled into augmented finite states machines (AFSMs) and these in turn into assembler code. Other prominent early instances of robotic programming languages are Colbert [5] and the Task Description Language (TDL) [6]. Colbert is a sequencer language created for the Sapphira architecture and used for developing intermediate modules connecting motion control and planning. The semantics of a Colbert program is based on FSMs, which are written in a subset of ANSI C. TDL, on its part, defines a syntax as an extension of C++ for task-level control such as task decomposition, synchronization, execution monitoring and exception handling. TDL code is compiled into pure C++ code with calls to a platform-independent task-control management (TCM) library.
Recent instances of domain-specific languages include the Extensible Agent Behaviour Specification Language (XABSL) [7], XRobots [8] and b-script [9]. The semantics of both XABSL and XRobots is based on hierarchical state machines. In XABSL, a given state of the whole system is determined by a subset of state machines. XABSL has been implemented on many platforms, mainly on robots that participate in the RoboCup soccer competitions. XRobots treats states as behaviours which are first class objects in the language and can be passed as parameters to other behaviours. It also integrates template behaviours, allowing generalized behaviours to be customized and instantiated. In contrast, b-script describes hierarchical complex behaviours via specialized generators. b-script's syntax is built on a combination of Python and C++. Programs written in b-script can be executed by an interpreter or compiled into C++ code.
Another common strategy for behaviour engineering is the use of general programming frameworks. GOLOG [10], a logic programming language based on the situation calculus, has been shown to be applicable to robot programming [11]. Some extensions of this language have added additional features such as concurrency [12], continuous change and event-driven behaviours [13], and execution and monitoring systems [14] to make it more suitable for some applications. Frob [15], a functional reactive programming framework embedded in Haskell, was introduced by Peterson et al. for programming robots at various levels of abstractions. For time-critical applications such as space rovers, the reactive model-based programming language (RPML) [16, 17] has been widely adopted. Ziparo et al. [18] presented a new formalism based on Petri Nets, called Petri Net Plans (PNP), for describing robot and multi-robot behaviours. UML statecharts have been applied for the same purpose, especially in soccer robots [19, 20]. An important advantage of the statechart and Petri Net Plans approaches is that agent behaviours can be naturally designed through intuitive graphical tools.
In the case of service robots, applications and tasks are commonly implemented with variants of FSMs. For instance, the behaviour of HERB 2.0 [21] is modelled by the behaviour engine (BE) [22] through three different layers using hybrid state machines. This approach has allowed the service robot HERB2.0 to carry out complex manipulation tasks in human environments. In [23], the TREX framework was used to control the behaviour of a PR2 robot. TREX integrates task planning, discrete and continuous states, and durative and concurrent actions. Under this framework, the robot was able to navigate through an office environment, open doors and plug itself into electrical outlets. Bohren et al. developed a Python library called SMACH [24, 25] based on hierarchical concurrent state machines. SMACH served as the task-level executive of a PR2 robot in a drink fetching and delivery application [24]. For the GRACE robot [26], FSMs were created under TDL to structure the tasks of the AAAI Robot Challenge. In the context of RoboCup@Home, the robots Cosero and Dynamaid [27] build hierarchical state machines to perform the tests of the competition, whereas the robot Caesar [28, 29] used an enhanced version of GOLOG named ReadyLog [30].
In summary, several different strategies and domain-specific languages have been proposed for robot behaviour engineering. Most of these strategies are implemented as extensions of conventional programming languages, mainly in the imperative, specification and functional paradigms. The formalisms used in these strategies are commonly built upon the FSM or extensions of it, thus often translating in a limited expressive power and unwieldy task programming. In contrast, SitLog is a logic programming language especially designed for service robot tasks and constructed on the more expressive formalism of DMs, which exploit the context and history of the task to decide the actions to be executed. In the logical programming paradigm, on its part, ReadyLog is focused on reasoning about actions, while SitLog has the notions of situation and task structure as its main representational objects. Here, we argue that SitLog is particularly suitable for specifying task-level behaviours of service robots in a flexible and concise way, as illustrated in the example in Section 4.4.
3. Task Structure, Situations and Dialogue Models
Service robot tasks can be construed in terms of states, in which the robot is capable of performing some specific set of actions, depending on the input information at the state and possibly on the previous states visited by the robot during the execution of the task. States can be seen from a dual perspective as states in the world and as informational objects in the robot's memory. Here, we say that the robot is
An important question from this perspective is how much information needs to be contained within information states in service robot tasks. Here there is a trade-off between expressive power and computational cost: if states have a small amount of information and the next state is determined to a great extent by the external input, computations are efficient, but complex tasks are cumbersome and difficult to model; on the other extreme, if states contain large amounts of information and the determination of the next state requires inference, expressing the task structure may be simple but at a high computational cost. These two extremes correspond to two opposing traditions in interaction and dialogue modelling: one is the use of FSMs, where states havea minimal amount of information expressed by the constant labels on their arcs; this approach is common in dialogue systems and interactive applications, and also in many service robots, as mentioned above. The other is the artificial intelligence tradition in dialogue modelling involving complex inference and the notion of conversational moves that need to be identified dynamically through searching during the execution of the task (e.g., [31] and derived work). In this later case, a state may contain a full “mental state” including a temporal and spatial situation, the history of the task, domain knowledge, the beliefs, desires and intentions of the agent, and even common sense knowledge.
In the present framework we adopt the view that the state's information content consists of the knowledge of the potential actions performed by another agent (the interlocutor) at such a state, with or without communicative intent, in addition to the knowledge of the potential events which can occur in the world at the particular state too; we refer to this knowledge as the
From another perspective, expectations are representations of the potential interpretations (i.e., the outputs of perception) that are plausible in a given context within a task structure, and differ from the raw data produced by low-level sensors, that need to be handled by low-level processes in a context independent fashion (i.e., independently of task and domain). A situation is then an abstraction over a collection, possibly large, of perceptions and actions, and tasks can be modelled often through a small set of situations. In addition, while an FSM changes the state when a symbol labelling one of its arcs is consumed (i.e., an event occurs in the world), an agent changes its situation when its set of expectations is changed. Although a change of expectations is often due to an external event, there are many external events that do not change the expectations, and the expectations can also be changed as a result of an inference, which is internal to the agent. Hence, situations are intentional knowledge objects in opposition to FSM states, which are extensional and deal directly with concrete input.
Other kinds of knowledge stored in the robot's databases and/or knowledge bases, like domain specific or general concepts, or even the robot's beliefs, desires and intentions, are not included in the situation in the present framework. These knowledge objects can be retrieved and updated from memory during the interpretation of a situation, but such knowledge is to a large extent context independent and not directly relevant to communication and interaction.
More generally, situations are contextual informational objects and the notion of expectation presupposes that the robot is placed in context in relation to the task. For this, the present notion of situation is restricted to tasks where this condition is met. We refer to tasks that can be construed in this form as practical tasks with their associated practical task and domain-independent hypotheses, extending Allen's corresponding practical dialogues notion and hypotheses [32] as follows: the structure of practical tasks, although complex, is significantly simpler than open human tasks (i.e., practical task hypothesis) and within the genre of practical tasks the structure of the tasks and task management are independent of the task domain and the particular task being performed (i.e., domain-independent hypothesis). We also advance the hypothesis that practical tasks lie somewhere between FSMs, which are too limited in their expressive power, and open search engines which demand an unbounded computational cost. SitLog has been developed to support this notion of situation and task structure. Next, we introduce the language and illustrate the framework with a simple application.
4. Specification of SitLog
4.1. Situations and Dialogue Models
A task T is represented in SitLog as a set of DM types T = [dm1, dm2, …, dmn] 1 . In turn each DM consists of a set of situation types dmi = [s1, s2, …, sn]. During the interpretation process, instances of DM and situations are created and executed dynamically, unfolding a graph that corresponds to the structure of the task. Identifiers and variables within DMs and situations have a local scope and these structures can be thought of as abstract data-types in the standard sense. The arguments of DMs and situations are called by reference and these abstract objects support recursive calls, providing a simple and expressive way to represent complex cyclic behaviours as illustrated below.
Dialogue model and situation names or IDs can be constant or predicate symbols with one or more arguments (e.g., dm1, dm2(X, Y, Z), s1, s3(Z, W)). Following Prolog's notation, identifiers starting with lower case letters stand for constants and predicates, and those starting with capital letters stand for variables. In addition to Prolog's variables, SitLog supports the definition of global and local variables; these are atoms (i.e., constants) with an associated value that can be assigned, modified or recovered within the situation's body and is preserved along the execution of the task in the case of global variables, or within the execution of a dialogue model in the case of local variables, as explained below.
A DM is expressed as a clause with three arguments: an identifier, a set of situations, and a set of local variables, as follows:
In this definition
The symbols at the left of ==> are the attribute names, and the symbols at the right stand for variables or expressions through which the actual expectations, actions, next situations and control information of the situation are expressed. These expressions are evaluated during the execution of the situation, and their values correspond to the concrete interpretations and actions performed by the robot in the situation, including the selection of the next situation, which can be a dynamic choice depending of the context.
Situations have three mandatory attributes:
The value of the
The value of the attribute
In addition to the interpretation modality types, there are two generic types:
Whenever a recursive situation is executed, the current task is pushed onto the stack, and the initial situation of the embedded model is interpreted. In addition, whenever a final situation is reached, the current DM is popped from the stack, with the identifier of the particular final situation in which the task was concluded. Recursive situations permit structuring tasks at different levels of abstraction, and behaviours are grounded in generic situations which perform actual interpretations and actions in concrete input and output modalities.
There are also four optional attributes:
This pipe mechanism is explicitly handled by SitLog's interpreter and the arguments' values are carried along by the interpretation process even if they are not explicitly stated in the specification of one or more situations.
The value of the
Finally, the attribute
During interpretation, the system keeps track of all concrete expectations and actions performed by the robot, with the corresponding situation, and these objects are assembled in a structured list, which corresponds to the structure of the task. This structure is called the
In summary, a DM stands for a schematic task and each DM instance unfolds according to the expectations met by the robot along the way, generating a concrete graph whose nodes are the actual situation instances and its arcs correspond to the concrete interpretation and actions performed by the robot during the execution of the task. In this way, a DM specifies an implicit RTN that is explicitly rendered during the execution of the task, and the expressive power of the formalism corresponds at least to a push-down automata, which is in turn equivalent to a context free grammar; in this latter view, recursive situations correspond roughly to variables, modality specific situations to constant symbols and productions to the
SitLog's programs are executed by two interpreters that work in tandem. The first is the F-RTN interpreter which interprets DMs and situations and unfolds the recursive graph. For this, the F-RTN interpreter inspects the value of the situation's attributes, selects the expectation that is matched with the current perceptual interpretation, executes the corresponding actions and selects the next situation.
The situation's content is specified through expressions of a functional term-language that we call
4.2. The Functional Language L
Expressions of the language
For instance, the following are all well-formed expressions of
The interpretation of expressions of
In the context of the situation, grounded terms stand for concrete expectations, actions or next situations; for instance, a constant or a grounded proposition may stand for a specific proposition expressed by the interlocutor, or for a concrete (fully determined) action, or for a specific next situation. Predicative functions resulting from the evaluation process may stand for a predication whose variables need to be instantiated by the perceptual interpreter out of the recognition of intentions expressed by the interlocutor, or out of the interpretation of natural but expected events in the world. The actions resulting from the evaluation can also be concrete and can be performed directly by the robot, or predicative functions with open variables that need to be further specified before these are sent to the actual rendering mechanisms. The next situations can be expressed by a constant or a grounded predicate (i.e., a situation with arguments), but can also be stated through functions, whose evaluation results in the actual next situation. In addition, the history can be accessed through a special predicate defined within a function's body, and it determines in part the function's value. Additionally, other deliberative resources and memory can be accessed through a similar mechanism.
4.3. Diagrammatic Notation
The graph of situation types have a diagrammatic representation, where nodes stand for situations and arcs are labelled with pairs of form
For instance, the diagram in Figure 1 illustrates a task with two DMs (i.e,

Graphical representation of the dummy application.
We also consider that a situation may have more than one instance; this is the case for situations with the same identifier but with different parameters or parameters values. This expressivity is useful for specifying modular or recursive behaviour (i.e., when one situation codifies the basic case and another the inductive pattern), where an arc labelled with an arbitrary
4.4. An Example Program in SitLog
In this section we illustrate SitLog with a program implementing the find task in which a robot searches for a person, an object or a list of objects through a discrete path until such entity or list of entities is found or the path is exhausted; for this definition we assume that the robot is already in the initial position of the search path. This is a common behaviour required in many service robot tasks, like the

Diagrammatic representation of recursive DM structures.
We illustrate SitLog with the actual definition of the
The actual SitLog code for this DM is shown in Listing 1. The specification of the six situations is declarative and corresponds quite directly to the diagram. The arguments of the

Abstract decomposition of find task.
The format of a call to the
This DM call specifies that the robot should search for three objects through a path constituted by four positions, and in each position must look at the current orientation of the neck (as a default convention) and also at the left and right. In addition, at each neck orientation it must look at the current tilt orientation, and also at −30 and −15 vertical degrees. In the execution of the path the robot searches in these three dimensions until a scene containing at least one of the sought objects is seen, and returns all objects in the scene that are specified as been sought in the variable

Diagrammatic representation of find DM.
In addition, a SitLog application, like the tests of the RoboCup Competition, requires that the global specification of all expectation and actions names (e.g., the
5. Dialogue Models and the Cognitive Architecture
Within the present framework we have developed the interaction-oriented cognitive architecture (IOCA), which is centred on SitLog's interpreter as illustrated in Figure 5. IOCA has three main layers corresponding to recognition/rendering at the bottom level, interpretation/action-specification at the middle and expectation/action-selection at the top processing level. These three layers are involved in the main communication cycle. The bottom layer of the architecture consists of the speech and vision recognition modules, for instance, that translate the external information into the corresponding internal codes on the input side, and of the actual realization devices for navigation and manipulation behaviours on the output. The middle layer on the input side corresponds to the interpreter that matches the expectations of the current situation, which are passed top-down from SitLog's interpreter, with the output of the recognition systems, which proceed bottom-up, and produces the representation of the expectation that is met in the situation. Structural processes are defined at this level; for instance, in the case of linguistic interpretation, the output of speech recognition is a text string, and the interpreter includes the parser which produces a syntactic and semantic representation. However, perceptual interpretation depends also on the context set by the expectations, and also on the interaction history, and the output of the interpreter is then a contextualized representation of the intention expressed by the interlocutor or a representation of an expected natural event. On the output side, the action scheme selected by SitLog is fully specified and realized by the corresponding rendering devices.
SitLog's specification of find behaviour.
In actual implementations there is a particular interpreter for each situation type defined at the level of SitLog. Interpreters involve one or more modalities, and are also relative to a particular perspective or aspect of the world; for instance, there is visual interpreter for face recognition and another for object recognition, regardless of the fact that these involve the visual modality. However, the output of perceptual interpretation is a propositional representation already independent of the particular vision algorithms and associated data structures. On the output side SitLog selects the action schemes, which are propositional and modality independent representations, that need to be specified with the relevant modality specific information and realized by the corresponding rendering devices.
IOCA has also a reactive behavioural cycle that establishes a link between recognition and rendering devices through the autonomous reactive systems, which are involved in the robot's reactive behaviour, as shown by the corresponding path in Figure 5. This cycle is embedded within the main communication cycle as it does not involve interpretations and the construction of fully fledged representations; this cycle involves mostly signal processing mechanisms that take place much faster than the communication cycle. IOCA also involves a coordinator between communication and reactive behaviour which permits that these two cycles proceed concurrently.

The interaction-oriented cognitive architecture.
6. Practical Tasks and the Golem-II+ Robot
Service robots have the purpose of supporting people to perform common daily life tasks. In practical settings, such tasks have the purpose of achieving one or more goals through the execution of a number of actions, possibly in a previously unspecified order. Simple examples of these tasks are the tests of the
For the construction of this kind of application our methodology distinguishes two main kinds of DMs: those targeted to model the structure of the practical task as a whole, like serving as a waitress in a restaurant, cleaning a room or assisting clients in a supermarket, and those directed to model general capabilities that can be used recurrently within different tasks, like learning a person's face, learning his or her name, finding people in a room, finding, grasping and delivering drinks, etc., which need to be coordinated in order to accomplish the goals of the task successfully. These latter kinds of actions are generic behaviours that need to be defined independently of task and domains, and constitute the library of behaviours that can be used systematically by full applications, like the find behaviour and their associated DMs which constitute a particular library. SitLog and IOCA have been developed with the purpose of supporting the high-level declarative definition of practical tasks in terms of a library of behaviours.
In order to test the present framework, SitLog and IOCA have been implemented in the Golem-II+ robot. Next, we list the main functionalities used by SitLog behaviours through IOCA and their associated devices and algorithms:
Face detection and recognition are carried out by standard OpenCV functions, employing the Viola-Jones method [33] for detection and Eigenfaces for recognition [34].
Visual object recognition is performed using MOPED [35]; this framework uses different images of an object to create a 3D model based on SIFT [36].
Person tracking is performed via a module based on the OpenNI driver.
Speech recognition is carried out via a robust live continuous speech recognizer based on the PocketSphinx software [37], coupled with the Walt Street Journal (WSJ) acoustic models for English speaking users, and the DIMEx100 acoustic models [38] for Spanish speaking users. Hand-crafted language models for each task are able to be switched on, depending on the context of the dialogue. Noise filtering is carried out by estimating the noise spectral components via a quantile-based noise estimator [39] and subtracting it from the speech signal in a non-linear form.
Speech synthesis is produced with the Festival TTS package.
User localization is carried out via audio using a triangular microphone array audio-localization system [40].
Route planning is carried via the Dijkstra algorithm [41] over a hand-crafted topological map of the environment.
Obstacle evasion is carried out via the nearness diagram [42] for long-range obstacles, and smooth nearness diagram [43] for close-range obstacles.
For object manipulation, two in-house 4-degrees-of-freedom robotic arms are used. Arm control is carried out by estimating a constrained polar-coordinate plane and via coordinated inter-motor movement. The grips are equipped with infrared sensors, which reactively make adjustments when taking objects, overriding vision errors.
7. Conclusions
In this paper we have presented SitLog: a programming language and environment for the specification and interpretation of behaviour engineering for service robot tasks in a simple and flexible way. The formalism has an associated diagrammatic notation that facilitates greatly the design and specification of complex tasks. The core computational mechanism consists of two interpreters working in tandem, one for interpreting the structure of the task and the other for interpreting content and control information. These two interpreters are implemented in Prolog and programs in SitLog follow closely the Prolog notation, supporting the definition of applications in a declarative and compact form.
We have also introduced the notion of situation which is an information state containing the expectations and potential actions of a robotic agent in the context of the task structure. Situations are related in a recursive directed graph giving rise to the notion of DMs, which can also be seen as abstract behavioural models. Alternatively, DMs can be seen as schematic and parametric plans to achieve complex tasks. In addition, we have presented the notion of practical tasks, and the practical task and domain independent hypotheses, and suggest that the tests of the RoboCup@Home Competition and similar demonstration scenarios are instances of practical tasks.
SitLog permits defining a library of general robot behaviours, like finding people or objects in a room, interpreting speech input, navigating to a designated place, coordinating visual object recognition and manipulation, etc., that can be used by different applications on demand, in addition to the main DMs representing the structure of a service task. This modularity is particularly useful for the implementation of general service robots, which need to assemble task structures dynamically, by the composition and interpretation of complex DMs out of the basic DMs stated in advance. This functionality can also be seen as the dynamic construction and execution of parametric plan schemes out of the basic schematic plans.
These notions permitted the definition of the interaction-oriented cognitive architecture (IOCA). SitLog is at the heart of this architecture and relates the flow of perception and intentional actions, articulating the robot's main communication cycle, which subsumes reactive behaviour, on the one hand, and manages symbolic representations and deliberative resources, on the other. SitLog is also task and domain independent and can be easily ported to different robotic architectures and operating systems.
We have also presented the specification and interpretation of complex robotic behaviour, illustrating the expressive power of the formalism. In addition, we have implemented the full set of tests of the
The source code of SitLog's interpreter together with the DM library of generic behaviours presented in this paper are available at http://golem.iimas.unam.mx/sitlog/sitlog.tar.gz. A video of Golem-II+ executing a RoboCup@Home task fully written in SitLog can also be seen at http://youtu.be/99XhhEkyIz4.
