Abstract
1. Introduction
Classification is an act of allocating an entity into a category on the basis of its properties. In this paper, we achieve optimal classification by judiciously making kinematic decisions for a mobile agent that minimizes the expected risk of the classification outcome. The risk of each classification outcome is determined by a loss function, whose weight is dictated by the preference of a strategist (or a policy maker) who defines the mission objective. The approach is parametric since the loss function is considered as a design variable of the mission and we use Bayesian inference to process the collected information. The problem is solved sequentially such that the solution provides a sequence of kinematic decisions that constitute a path.
1.1. Motivation
Unmanned Aerial Vehicles (UAVs) have proved to be an invaluable force multiplier for the Joint Force Commander (JFC) [1]. It is predicted that the UAV market is to more than double over the next decade [2] where UAVs can provide both persistent and highly capable intelligence, surveillance and reconnaissance (ISR) [1]; ISR capability is therefore the number one combatant commander priority for UAVs in the U.S. Army [3].
Every military operation has a strategist who decides the desired strategic goals of the mission where the goals are to be achieved by applying military resources and means, such as UAVs. The actual strategy must be incorporated effectively so that all the resources are fully exploited in the service of the strategic goals. The motivation of this work is to optimize the couplings between several operational components beginning with the decisions of the strategist, continuing with the kinematic decisions of the mobile agent, followed by the information collection by the sensors carried by the agent, and ending with the classification decisions taken by the agent.
1.2. Mission Overview
In a given area, there are a number of objects of interest whose locations and categories (i.e., target or non-target) are known
1.3. Literature Review
Much work has been done previously under the broad topic of probabilistic decision making in UAV operations. In [4], the authors investigated the use of human operator feedback for target recognition in an ISR scenario where a team of Micro Aerial Vehicles (MAVs) is assigned to fly over a number of objects of interest and the operator must decide whether or not the object is a target. In [5], the authors presented decision making strategies under uncertainty and adversary action for the Cooperative Operations in UrbaN TERrain program (COUNTER), where stochastic dynamic programming was employed to optimize the resource allocation. In [6], the problem of path planning for a UAV in the presence of radar-guided surface-to-air missiles was investigated. The radar model was formulated probabilistically, and the optimal strategy obtained in this work confirmed the standard flying tactics such that an aircraft should “deny range, aspect and aim”.
With an information-theoretic measure, one can quantify information using the probabilities associated with the outcomes of the situation at hand. In [7], the author investigated the use of Shannon's information [8] in performance and requirement estimation in multisensor fusion applications. A set of heuristics was developed to relate the information content and recognition performance of a sensor system by using Johnson's criteria [9]. In [10], optimal sensor parameter selection in a static visual system was studied with the optimality criterion as the reduction of uncertainty in the state estimation process using Shannon's mutual information. In [11], sensor management in a dynamic environment was examined where the authors used an active sensing approach that combines particle filtering, predictive density estimation, and relative entropy maximization.
1.4. Original Contributions
The original contributions of this work are as follows.
We propose a model that accounts for the interaction between the agent kinematics, informatics, classification and strategy.
Within this model, we pose and solve a sequential decision problem that accounts for strategist preferences and the solution to the problem yields a sequence of kinematic decisions of a moving agent.
The solution of the sequential decision problem yields the following flying tactics: “approach only objects whose suspected identity matters to the strategy”. These tactics are numerically illustrated in several scenarios.
1.5. Relevance to Past Work
Previously, we posed a problem where the mission of the unmanned vehicle was to travel through a given area and collect a specified amount of information about each object of interest while minimizing the total mission time [12]. Shannon's channel capacity [8] was utilized as a model for information collection. An optimal control problem was posed and the necessary conditions for optimality of the path were analytically derived. Furthermore, numerical results were illustrated on several time-optimal cooperative exploration scenarios. This method was developed for vehicles equipped with range-based sensors and travelling in an uncertain area [13]. In [14], time-optimal paths are generated for a vehicle to collect specified amounts of information about objects of interest at known locations using noisy anisotropic sensors. Trajectories are sought to improve information collection by finding trade-offs amongst the couplings between kinematics, informatics and estimation; the approach was validated by a hardware demonstration using a three-wheeled ground robot.
In the presence of multiple non-isotropic objects whose information collection rates are not constant with respect to the relative positions between the agent and the object, finding a feasible path that satisfies the mission specification can be challenging. The bond energy algorithm [15] is an optimization heuristic that iteratively improves infeasible path conditions to feasible ones for multiple non-isotropic objects. In [16], we investigated the path planning strategies of a Dubins vehicle with several (range-based) sensor configurations. In [17, 18], we studied the optimal informative path planning with Bayesian classification that accounts for the couplings between kinematics, informatics and classification.
The presented work is an extension of our previous work and we investigate the path planning of a mobile agent that accounts for the couplings between the agent kinematics, informatics, classification and strategy.
1.6. Paper Outline
The remainder of the paper is as follows. In Section 2, the modelling for kinematics, informatics and classification are presented, and the risk function that is used as the cost function is formulated using the notion of confusion matrix. In Section 3, an optimization problem is formulated such that the risk function is to be minimized with respect to the kinematic and classification decisions. In Section 4, the risk function for different outcomes of the action variable is derived and loss function candidates for interesting mission scenarios are presented. In Section 5, numerical simulation results are given. The conclusion and future work are discussed in Section 6.
2. Modelling
The agent comprises three relevant subsystems: kinematics, informatics and classification. In this section, we present the model for each subsystem. The strategy is treated separately to the agent's subsystems (Sec. 2.5), as it defines the mission objective.
2.1. Kinematics
We assume that the physical world that the agent explores is a
Let us define a decision variable
2.2. Informatics
We define two events which are the property of the object (X) and the measurement type (Y), respectively. For the object property
Measurement characteristics
2.2.1. Property of Object
The probability of the object being a target and the probability of the object being a non-target are defined as
where
2.2.2. Likelihood Functions & Constraint
Given the two events,
where 0 ≤
Based on the fact that the sample space probability must equal to one, i.e.,
This is a constraint that the likelihood functions should satisfy in order to be proper functions.
2.2.3. Modified Performance Prediction Model
A performance prediction model for Electro-Optical (E-O) sensors is used as the measurement model for the agent. The modified performance prediction model (MPPM) for E-O systems is defined as the product of the target transfer probability function and the cross section probability.

Probability of discrimination tasks (i.e., detection, recognition and identification) with respect to range (in unit length)
The probability of successful discrimination is not only dependent on the range, but also on the orientation of the object. An object can be observed via various viewpoints, but only through the vantage point can the proper feature that classifies the property of the object be seen. This is analogous to observing dice sideways, where some of the sides have an ordinary pattern (non-feature) while the others have some particular ones (feature). Since the orientation of the object is unknown
with 0 ≤
The completed modified performance prediction model describes the probability of successful discrimination as a function of the range (
These likelihood functions are known functions obtained by calibration prior to the mission.
2.3. Classification
2.3.1. Snapshot
The Bayes theorem gives the
where
2.3.2. Subsequent measurements
The Bayes theorem still holds for sequences of evidence. Assuming that evidence is conditionally independent to evidence at other instants given X, we get
where the superscript on
Suppose that among the total
2.4. Interaction between Subsystems
The agent comprises three subsystems: kinematics, informatics and classification. The kinematics subsystem describes the physical representation of the agent in the actual physical world that has several objects of interest in it. The important information that the agent gets from this subsystem is the relative distance, or the range, between the agent and the object of interest. In addition, the agent decides which direction to go in this subsystem, and the time history of these decisions creates the path of the agent. The informatics subsystem describes how the agent extracts information out of the object of interest. The extracted information is abstracted by three levels of measurement characteristics, denoted as

Block diagram of interconnected subsystems
Kinematic decisions affect the classification result because the likelihood functions, Eq. (7), are dependent on the range between the agent and the object, and the
2.5. Strategy
In this section, we define the Bayes risk as the cost function for the dynamic optimization problem. First we define the confusion matrix, Which is shown in Table 2 (OOI: Object Of Interest). There are four possible outcomes.
Confusion matrix
The probability and the loss function corresponding to each outcome are defined as
The first equation is the potential risk of making a decision when
Without loss of generality, let
Then the risk function takes the form of
This is called the conditional risk function. Note that the risk function varies depending on the type of observation,
Bayes risk is the averaged risk function over the distribution of the random variables
where
Loss Functions
In mission operations, the loss functions are determined by the strategist who decides the priority of the mission. There are several ways of posing the loss function. Depending on the loss function, the resultant paths are likely to be in one of two categories:
Strong emphasis on classified target being an actual target. In this case the loss function for false positive (
In summary, this is when the strategist places more emphasis on not judging non-targets as targets.
In the opposite situation, the emphasis is on the classified non-target being an actual non-target. In this case the loss function for false positive (
In summary, this is when the strategist places more emphasis on not judging targets as non-targets.
3. Optimization Problem
We formulate the cost function,
where
4. Stochastic Dynamic Programming
In the path planning, there are two decision problems at hand which are
Whether to stay (and complete the mission) or to move (and take another observation).
If the agent decides to stay, which hypothesis should be chosen?
If the agent decides to move, which way should the agent go?
We approach this decision problem with using the Bayes decision theory and solve it with stochastic dynamic optimization.
4.1. Decided to Stay
Suppose we are in the case where the agent decided to stay, i.e.,
with
Now the problem is to identify the sets
The probability of false positive and false negative can be defined with their probability distribution function. Let
Then,
and
Substituting these into the Bayes risk formula gives
Now using this formula, we want to find the set
If there exists any
the integrand in Eq. (19) will be negative, thus giving a smaller risk than
4.2. Decided to Move
The Bayes risk for the case where the agent decided to move is formulated as
The expected risk over the measurement
where the second equation is obtained by using Bayes' theorem. There are four cases for the expected risk over the measurement,
4.2.1. Multiple Observations
The Bayes risk for multiple subsequent observations is formulated in this section. Here we introduce more notation:
The Bayes risk of measurement at time
Taking the expectation value over
Since
Substituting this into the Bayes risk equation gives
This is the expected loss when the agent decides to move in some other direction. The actual risk may be larger or smaller depending on the sampled measurement. Supposing that we know the likelihood function, the risk can be determined.
5. Numerical Simulation Results
In this section, we present simulation results that were obtained by solving the SDP numerically. We used MATLAB as the simulation environment. First, we consider two scenarios, i.e., when

The resulting paths and the corresponding Bayes risk when

The resulting paths and the corresponding Bayes risk when

The resulting paths and the corresponding Bayes risk when
Figure 3(a) shows the agent movement in a 2-D space for four different initial positions each of which is colour-coded. The loss functions are
Figure 4(a) shows the agent movement for four same initial positions as in Fig. 3(a), but with the loss functions being
Figure 5(a) shows the agent movement with the loss functions being
6. Conclusion & Future Work
In this paper, we proposed a model that accounts for the interaction between the agent kinematics, informatics, classification and strategy. For kinematics, the agent follows the Manhattan gridded space where there are five kinematic decisions that correspond to the directions of movement (i.e., up, down, left, right) and stay. For informatics, the measurement is abstracted by three discrete measurement types (i.e.,
As future work, we will investigate several avenues which are 1. inclusion of the azimuth-dependent measurement model, 2. online estimation of the likelihood function, and 3. mathematical definition of target “deception”. Furthermore, we plan to investigate the effects of couplings between different subsystems (e.g., classification and strategy), as well as objective functions (e.g., information and classification performance), in path planning applications.
