Abstract
Introduction
Practical reasoning is concerned with deciding what to do, or justifying what one has done [28]. Agents need to use practical reasoning because they are situated in a changing world, are able to influence how the world changes, and have preferences between the states that those changes will lead to. Moreover, their ability to act is a limited resource, and so they may need to choose between several beneficial actions, actions which would improve the state from their point of view, so as to identify the best, or at least the one they most prefer. The focus of our work is on justifying reasoning about what to do and why, rather than the specifics of
Normally there will be aspects of the current state that the agent likes, and aspects that it does not like. So, with respect to change, the agent will have four possible motivations:
To make something currently false true (we call this an
To make something currently true false (we call this a
To keep something currently true true (
To keep something currently false false (
What an agent wants can be specified at several levels of abstraction. Suppose an agent enters a bar on a hot day and is asked what it wants. The agent may reply:
I want to increase my happiness.
I want to slake my thirst.
I want a pint of lager.
The first reply relates to something which is almost always true, and for the sake of which other things are done. Normally there will be several things that can meet this objective. The second is a specific way of increasing happiness in the particular current situation: it is a remedy goal. There is an element of the current situation the rectification of which would increase the happiness of the agent. Again there are several ways of bringing this about. Finally the third reply identifies a specific way of remedying the situation: the agent selected a lager in preference to water, juice, etc. It is a specific condition under which the goal will be satisfied. Previous work such as [4] has used
In the current circumstances
According to PRAS,
This argument scheme, and a number of ways of challenging arguments made using it (so-called
Section 2 will give the basis of the formalisation in [4]. Section 3 will describe the limitations of the scheme proposed in [4]. Section 4 will extend the formalisation to enable some of these limitations to be addressed, and relate this to some other previous work in the literature. Section 5 will address the limitations with the new machinery, and illustrate the points with three detailed examples. Section 6 will offer some discussion and conclusions.
AATS with values
AATSs were originally presented in [35] as semantical structures for modelling game-like, dynamic, multi-agent systems in which the agents can perform actions in order to modify and attempt to control the system in some way. These structures are thus well suited to serve as the basis for the representation of arguments about which action to take in situations where the outcome may be affected by the actions of other agents. First we recapitulate the definition of the components of an AATS given in [35].
(AATS).
An Φ is a finite, non-empty set of
Where
(AATS+V).
Given an AATS, an AATS+V is defined by adding two additional elements as follows:
An
PRAS can now be expressed using this formalism.
(PRAS).
In [4], seventeen potential ways to attack arguments made by instantiating PRAS were identified:
Are the believed circumstances true?
Assuming the circumstances, does the action have the stated consequences?
Assuming the circumstances and that the action has the stated consequences, will the action bring about the desired goal?
Does the goal realise the value stated?
Are there alternative ways of realising the same consequences?
Are there alternative ways of realising the same goal?
Are there alternative ways of promoting the same value?
Does doing the action have a side effect which demotes the value?
Does doing the action have a side effect which demotes some other value?
Does doing the action promote some other value?
Does doing the action preclude some other action which would promote some other value?
Are the circumstances as described possible?
Is the action possible?
Are the consequences as described possible?
Can the desired goal be realised?
Is the value indeed a legitimate value?
Is the other agent guaranteed to execute its part of the desired joint action?
These critical questions were divided into three groups, each appropriate to a different stage in the reasoning:
Limitations of PRAS
PRAS has been used in a variety of contexts, including simple puzzle solving (e.g. [4]), law (e.g. [9,33]), medicine (e.g. [6]) and e-participation (e.g. [11]). It has also formed the starting point for the extensive investigation of reasoning with values in the work of van der Weide and his colleagues (e.g. [31]). None the less PRAS as formalised in [4] has some distinct limitations, including the treatment of goals, consideration of the effect of actions only on the next state, and the fact that many differences between agents are implicit in the formulation of the AATS. Perhaps the most important of these is the absence of a proper notion of goal from the AATS, and the consequent inability to explain the promotion of values in terms of goals. Problems relating to look ahead were considered in [5].
Whereas the informal version of the scheme links future circumstances, goals and values, the formal version does not, in that there is no clean separation of circumstances and goals. Goals even disappear altogether in some applications (e.g. [36]). Also in the formalisation of [4] values are simply labels on transitions, without any justification in terms of the change in circumstances resulting from the new state.
The problem with goals in the AATS is that states can only be described as assignments to the set of atomic propositions, Φ. This means that a goal can be no more than a subset of assignments to elements of Φ. Thus a goal can be satisfied in only one way, whereas the original intention was that a goal could potentially be satisfied in a variety of ways. Also, given the state, the conjunction specifying the goal is unarguably true or false, removing the possibility of arguing as to whether the goal is satisfied in a given state, and so losing much of the point of considering goals.
Four of the critical questions, CQ3, CQ4, CQ6 and CQ15 concern goals, and so the absence of goals from the AATS formalisation of [4] does not allow these to be properly expressed, goals there being treated only as subsets of Φ. And in relation to the values promoted by realising goals, in [4] differences in value promotion were considered to be expressed by different agents having different AATSs, without any explanation of the differences, or how they might be represented. This is a further limitation we will address in this paper.
Aside from the issues with goals described above, there is also the limitation of there being only a single step of look ahead. This means that actions performed in order to enable particular things to happen, or prevent things from happening, in the future, cannot be justified cleanly with PRAS. This limitation was addressed in [5]. The remainder of this paper is concerned with providing a means to specify goals, and link them to values, which will allow the proper expression of the critical questions mentioned above. In the next section we start to tackle this by extending the AATS formalism.
Extending the formalism
To allow us to express goals as more than simple assignments to atomic propositions in Φ, we introduce a set of intensional definitions, Θ, which can be regarded as a set of clauses, as defined below.
(Clauses).
Let Θ be a set of clauses of the form
(Defined terms).
Let
(Goals).
A goal of an agent
As we saw in Section 1, goals require us to consider two states: the current state
If
If
If
If
The above can form the basis of necessary conditions for
Therefore what we need to do is to link goals to values. Recall that in the formalism of [4], values are used to label transitions between states. Recall also the
(Goals and value promotion).
Let Δ be a logic program with clauses of the form:
Thus, for example, an achievement goal will be of the form
In our previous work [4] we indicated that differences in value promotion were considered to be expressed by different agents having different AATSs. The set up we have presented above will make the promotion of values an objective matter, the same for all agents. If we wish to make what counts as promoting a value a subjective matter this can be done in several ways, depending on whether we wish to allow disagreement as to what counts as a value, what counts as promotion, or both. To allow agents to disagree on what counts as a value, each agent will have its own set of values,
Note that we could also allow agents to have their own versions of the definitions in Θ, so that each agent had its own program
Relation to other work
Now that we have spelled out the details of our approach, before we demonstrate it through application to different scenarios we compare and relate the approach to other work on practical reasoning in multi-agent systems. Argumentation has been used as a basis for a number of different proposals for how to handle practical reasoning and decision-making in agent systems, for example, [2,15,18,26,32]. For our discussion here we compare our work with a general model for agent reasoning (the BDI model), given in textbooks such as [34] and another approach [26] that is specifically grounded in argumentation.
BDI models
One very common way of representing agents is using the Belief-Desire-Intention (BDI) Model (e.g [27,34]). In this model agents have sets of beliefs and desires and commit to particular desires according to their current circumstances, so that these desires become intentions, which they then attempt to realise. Typically desires are filtered into candidate intentions, those that the agent can currently accomplish, and the intentions are selected from these. For simplicity we will assume here that an agent must select one and only one candidate as its intention.2 Here we discuss only the basic BDI systems as presented in e.g. [34]. There are, of course many variants on BDI (e.g. [12] and [29]), but here we can consider only the core idea.
In our model we justify desires, which are particular states of affairs, and which apply in some states and not in others, by using values, which are persistent aims and aspirations. The beliefs of an agent are given by the state in which an agent believes itself to be in (i.e.
Another approach to practical reasoning was proposed in [26]. Here the agents also have a set of desires,
Both desire generation rules and planning rules can be seen to have their equivalents in our framework. Given an agent with a set of values
Summary of comparison with related work
The key feature of our approach compared with the more traditional approaches is that we have used the idea of values to justify what an agent desires in particular situations. Now desires, rather than comprising a fixed set of states of affairs which the agent wishes to achieve (perhaps, as in [20], supplemented by some additional desires derived from basic desires and the current situation), are instead derived according to the nature of the agent and the particular context. This means that they can be justified by pointing towards the values promoted by moving from the existing state to a new state, rather than being unchallengeable givens for the agent.
In BDI there is often some equivocation between whether desires and intentions are states of affairs or actions: Here the relationship is clear:
We believe that the ability to give reasons for why we find particular states of affairs desirable is important: in each art the good is that for the sake of which other things are done … in medicine this is health, in generalship victory: in every action and decision it is the end. ([25]: 1097a18-21) Honour, pleasure, understanding and every virtue we certainly choose because of themselves, since we would choose each of them even if it had no further result ([25]: 1097a28-b5)
We believe that the values used in our approach are a better reflection of this characterisation of practical reasoning than are the desires of the BDI approach.
Richer practical reasoning
Returning to the exposition of our approach, we now examine how our additional machinery allows the proper expression of the critical questions relating to goals.
In our original account in [4] a goal was simply a particular assignment to a subset of propositions, Φ,
CQ4 disputes whether a value is promoted by a goal, and hence, whereas in [4] CQ4 was a simple question of the sign returned by the
In [4], posing CQ6 merely required there be an alternative action which realised the desired conjunction of atomic propositions. Now we have extended the notion of goal to include intensionally defined goals, and we allow different agents to define these terms differently. Moreover, as we saw in Section 4, achieving the requisite state of affairs is only a necessary condition for achieving the goal. Thus while an alternative way to satisfy this condition does indeed allow CQ6 to be posed successfully, we also need to show that the required link to values is also realised, that there is some value for which
Finally, CQ16 in [4] concerned only whether the atomic propositions in the conjunction are co-tenable, whether they can occur in some state
We will now look at some examples illustrating the use of our extensions.
Trains and tunnels
We begin with the example used to illustrate the original AATS as introduced in [35]. There are two trains, one of which (E) is Eastbound, the other of which (W) is Westbound, each occupying their own circular track. At one point, both tracks pass through a narrow tunnel and a crash will occur if both trains are in the tunnel at the same time. Each train is an agent (i.e.

AATS for Trains scenario. AW = east away, west waiting, etc.
We define one term in Θ. We say that
The agents will have two values: Progress and Safety, the first promoted by moving, the second demoted by a crash. Thus the basic Δ is the same for both agents in this case:
Each agent can now form its own
Now we can see that Progress will motivate movement to the next state through an achievement goal, while Safety will motivate idling as an avoidance goal in the state where both trains are For simplicity we express arguments only in terms of current state, supported action and value.
In a state where I am waiting I should move to promote progress.
In a state where I am waiting I should idle to promote safety.
If we now assume that trains prefer Safety to Progress, they will both prefer Arg2 and so they will get stuck in waiting, since Arg1 will be open to an objection based on CQ9:
In a state where I am waiting, I should not move since this may demote safety.
and Arg2 will be defeated by this preference for safety.
Suppose, however, we add a clause to Θ
Now we revise the value theory for
Now this additional knowledge that Progress will not be promoted by entering the tunnel unless it is safe to do so will mean that the agent
In a state where it is safe to enter and I am waiting I should move to promote progress.
This argument is not subject to Obj1. It will not, however, ensure deadlock is avoided. Obj1 still applies to
For the objection such as Obj1, arising from CQ9, to be effective, three things are needed. As noted, Safety has to be preferred to Progress. Thus deadlock could be avoided if
Can you be sure that the other train will not move?
If, however,
To summarise: this example illustrates:
the use of goals to define non-atomic propositions, such as
how the goals and the desirable transitions can be generated from the values of the agents, and how the value preferences allow a particular transition to be chosen. Both these aspects are open to debate within our framework. This movement from value to chosen action corresponds to the identification and filtering of candidate intentions in the BDI model.
how differences between agents can be expressed using different perceptions of how values are promoted, and how these can help to resolve deadlocks, especially if the agents are themselves aware of these differences.
For our second example we look at the choice between career and domestic life. We assume two agents,
For the example we will consider only what is necessary for the example: some larger system may be assumed. In this example there is no interaction between the agents, and so we can consider them separately. Accordingly the AATS will be applicable to both Mary and Jane (both agents thinking of themselves as “me”, hence suffix “m”, and their partner as “p”). The action options are
We now introduce some defined terms in Θ. First we define
The relevant AATS fragment is shown in Fig. 2.

AATS for Mary and Jane scenario.
Now we define Δ. Both Mary and Jane have the values of Money, promoted by being rich, Relationship, promoted by
Note that here the current value of
If
Now consider the choice that Mary and Jane must make. We can now see that Money will be promoted for both by
Now suppose that Mary wanted to challenge Jane’s decision. Although both are entirely agreed on the problem formulation, share the same values, and have used the AATS correctly in accordance with their own preferences, they disagree on whether Happiness will be promoted or demoted (CQ4 from [4]) and so label the
So far we have, like previous work such as [4], used an ordering on values to adjudicate conflicts between arguments. Our new machinery for considering goals, however, offers an opportunity for a different basis for choice. If we just consider the program
We have used this example to illustrate in particular:
The effect of agents having different ways of promoting their values;
The possibility of using a general principle such as loss aversion to rank arguments, to complement or even replace value orderings.
Our third example will illustrate the distinction used in [10] between values which have no effect after they have reached a certain threshold (
This example concerns an agent trying to strike an appropriate balance between work and leisure. Employees often have some say over how many hours they will work, and may choose the extra leisure or the extra money according to their individual preferences. We model this situation in the AATS fragment of Fig. 3. The propositions of interest are

AATS for Working Hours scenario.
In Θ we define income as the product of
Often economics assumes that when people make decisions they always prefer more of a good. Empirical work, however, suggests they can been seen as
Now consider possible rules for Δ. We have two values, Money and Leisure. Because
But for satisficers, the rules will be:
Agents may mix and match these rules: they may be maximisers for both values (M1–M4), satisficers for both values (S1–S4), money maximisers and leisure satisficers (M1, M2, S3 and S4), or money satisficers and leisure maximisers (S1, S2, M3 and M4). We will term the agents MMLM (i.e. money maximiser and leisure maximiser), MSLS, MMLS and MSLM respectively. Loss aversion can be effected by giving the demotion rules priority over the promotion rules. Where they are maximisers of one value and satificers of another they may tend to prefer satisficing to maximising. Thus a money satisficer and leisure maximiser will order the rules M4, M3, S2 and S1. Alternatively the agent may express its preferences in terms of values rather than the general principles. We will represent this value preference by the order of the values and qualifiers: thus in the case of the double maximiser, MMLM prefers money to leisure and LMMM prefers leisure to money.
Now consider that AATS fragment in Fig. 3, where the hourly rate is fixed at 10. Agents may choose to increase their hours, or stay the same (we assume that the employer is not making the reduced hours option available: the symmetry means that we can make this simplification without loss of generality). Suppose working eight hours is the initial state. Let us now consider our various agents in turn. We will assume that agents are not loss adverse in general, but will not wish to fall below a satisfied threshold for one value for the sake of improving the other, even where that other value is preferred.
The satisficers will have different choices according to their thresholds. There are four situations: both thresholds are satisfied,
Choices of maximising and satisficing agents with various preferences.
The double maximiser will have an achievement goal based on money to increase its hours, but an avoidance goal based on leisure to refuse the extra hours. Since thresholds are not applicable to these agents, the choice will be based on value preferences: MMLM will increase its hours and LMMM will keep them the same.
The double satisficer will only have arguments to increase its hours when the money threshold is not already satisfied, and will only have arguments against increasing its hours where the leisure threshold will cease to be satisfied as a result. Only where neither threshold is satisfied in both states will the preference between the values determine the action.
The money maximiser and leisure satisficer will always have a reason to increase its hours but will only have a reason not to increase its hours if this would take it below its leisure threshold. Here the value preference makes a difference only if the leisure threshold is unsatisfied in neither state. Similarly the value preference makes a difference to the money satisficer and leisure maximiser only where the money threshold is satisfied in neither state.
What this shows is that we can get a variety of behaviour, even when agents share value orderings. Whereas maximisers will always act in accordance with the pure value preference, the influence of this preference decreases when the agents are satisficing values.
This also has implications for an employer who wishes to encourage staff to work overtime. The obvious course would be to increase wages. But suppose we assume that all staff currently satisfy their thresholds. Now an increase in hourly rate will only attract money maximisers, and, where the additional hours would take the agent below their leisure threshold, not even these. Note that in this case, where both thresholds are satisfied initially, the value preferences of the workers do not make a difference at all: money maximisers will accept the extra hours only if it does not jeopardise their leisure threshold, and money satisfiers will not be interested. Worse for the employer is that the increase in wages may enable leisure maximisers to reduce their hours while keeping above their money threshold, and so the increased wage will result in fewer hours worked by such agents. It is probably for this reason that overtime hours are often offered at a premium rate, not applicable to the basic hours. But the effect may still cause problems with staff with no standard hours, for example, casual bar staff.
Perversely, the employers may be able to attract more employees to overtime by
To summarise this example, it illustrates:
The idea that the motivation offered by promoting a value may change according to the current situation.
The distinction between motivating values, which are always prized, and values which agents require up to a sufficient level, but which they do not value beyond that.
That value preferences may play a secondary role to the need to attain satisfactory levels for the various values.
In this paper we have revisited an account of practical reasoning using arguments and values to consider its limitations and provide mechanisms to overcome these. In the account of
Moreover, as we have seen from the examples, we can use this machinery to model aspects affecting choice other than value preference, to allow us to model some quite sophisticated reasoning, such a loss aversion and the difference between values the agent wishes to maximise and those it wishes to satisfice. The examples we have provided in Section 5 are intended to motivate the need for our refinements and demonstrate how they work in different applied reasoning scenarios that have different features of interest which necessitate the ability to make these distinctions.
The new notions in our account as presented in this paper form part of a larger body of work intended to increase the expressiveness and improve our account of practical reasoning. In [4] the argumentation considered only the next state, and so was unable to express arguments based on the need to reach a state from which a particular value could be promoted, or to avoid states in which the demotion of values became inevitable. This was addressed in [5].
We further envisage our new work on practical reasoning as being expressed using appropriate argumentation schemes that can themselves be formalised in a suitable framework, such as ASPIC+ [24], so that desirable properties, such as the satisfaction of rationality postulates, e.g. [13], can be shown to hold. Furthermore, we intend to look at proofs of correspondences between our approach and others, such as the BDI approach. The work set out in this paper provides the essential basis which will enable us to tackle all these issues.
