Abstract
1. Introduction
TORCS (The Open Racing Car Simulator) [1] is an open-source 3D car racing game which has a GNU license and has been used as a test bed for many AI (Artificial Intelligence) algorithms. Because of its realistic 3D simulation, it has been used for research on intelligent vehicles, robotics and games. For example, the simulator has been used in laboratory-level experiments with human drivers monitoring their brain/body signals [2, 3], behavioural patterns [4] and 3D postures [5]. The simulator has been used for intelligent transportation research: the coordination of intelligent vehicles [6], simulated traffic scenarios [7] and digital human driver modelling [8].
Since 2008, international game AI competitions have changed their basic platform from 2D car racing to 3D car simulation, i.e., TORCS [9]. In competitions, a TORCS server provides each controller with 79 sensory values of a simulated car (
However, developing a sophisticated controller is not a trivial work. The
In recent years, many kinds of computational intelligence methods have been used for developing a sophisticated controller. Onieva et al., the winners of the championships in both 2009 and 2010, proposed a parameterized modular architecture with a fuzzy system and a genetic algorithm [10, 11]. Butz et al. developed their own controller named COBOSTAR and won the 2009 competition held by CEC and CIG [12]. Hoorn et al. proposed an imitative learning controller, adapting a multi-objective evolutionary algorithm [13]. Munoz et al. obtained data (trajectory and speed) from humans and then trained neural networks [14]. Cardamone et al. suggested a high-level action prediction model with high-level information about the track [15]. Seo et al. developed their own heuristic controller, which drives as safely as possible [16, 17]. They used an ANN (Artificial Neural Network) and a linear regression to predict the desired speed in different situations. We call their controller the “BFB3”, the name of the controller used at the CIG 2011 competition (5th rank) [16]. Quadflieg et al. developed a human-readable track recognition system which enables ‘looking into the next curve', using a combination of advanced pre-processing steps and a simple track-type classifier [18].
In this paper, we propose optimizing the parameters of an autonomous car controller using ESs (Evolutionary Strategies) and describe how the most generalized parameter set can be retrieved from the process of optimization. For this, we propose a new controller of TORCS whose detailed behaviour is governed by a number of parameters. Our contributions in detail are as follows.
We propose a new controller for TORCS by combining BFB3 [17] and Quadflieg et al.'s approach [18] to provide fast and accurate judgments of circumstantial situations by pre-processing track sensor values from a TORCS server. To accelerate in an accurate and aggressive manner, we modify the acceleration and brake control part and the steering control part of the BFB3 by introducing 10 new parameters and using the curvature information of a track segment, as in [18].
We propose optimizing the parameters of our controller using a simple ES and a self-adaptive ES. To find the optimized values of the parameters and use 10+10 ESs. We obtained the track-dependent optimal parameter sets separately for 6 tracks (Alpine1, ASpeedway, Cgtrack2, Eroad, Wheel1 and Wheel2), i.e., 12 optimal parameter sets in total. Experimental results show that our controller significantly improves the performance of the BFB3 for the 6 tracks. Moreover, the self-adaptive ES is better than the simple ES on most tracks.
We derive the most generalized parameter set from the process of optimization to develop a practical track-independent controller. We use intermediate generations as well as the latest generations produced during the process of finding the track-dependent optimal parameter sets. Among the parameter sets of these generations, we choose the one which minimizes the sum of the differences of lap times from track-dependent controllers for the 6 tracks. We experimentally compare the generalized controller and the BFB3 on 12 tracks.
This paper is organized as follows. Section 2 introduces the BFB3 that our controller is based on and describes the weaknesses of the BFB3. In Section 3, we present our modifications to improve performance of it. Specifically, we introduce new additional parameters and rules to enhance the controller and describe our learning approaches (ES) as applied to optimize our new parameters. In addition, we present how to derive the most general parameter set from the solutions found. Section 4 shows our experimental results, where we give comparative studies analysing two different learning approaches and compare the performances of the BFB3, the track-dependent controller and the track-independent one. In Section 5, we offer our conclusions.
2. Overview of the BFB3
In this section, we give an overview of the BFB3, which our proposed controller is based on. The BFB3 follows the philosophy of “modular architecture” proposed by Onieva et al. [10]. Figure 1 a) shows the modules making up the controller. Each module controls the acceleration and the brake, the steering, gears, etc., of the simulated car. The most important and complex module is the acceleration and brake control module composed of several sub-modules, such as the desired speed module, the ABS and the TCL module. We present a brief summary of the desired speed module and the steering control module, which are modified in our new controller. We keep the other components intact in our new controller.

Basic modular components of BFB3 (SAZ=Speed Adaptation Zone)
2.1. The desired speed module
The desired speed module determines the
The BFB3 uses different strategies according to the current situation of the car, which is categorized into three groups by the value of the centre
If CT ≥ 150m, the car is considered to be on a “straight” track segment (i.e., there is no obstacle in front of the car and it is less at risk of being stuck) and thus the
If 70m < CT < 150m, the car is considered to be in a
where
Seo et al. tried to retrieve the generalized parameter
If CT ≤ 70m, the car is considered to be at a “corner” track segment and a linear regression analysis is used to determine the
2.2. The steering control module
The basic strategy of the BFB3 for the steering control is to have the car travel the longest free distance. The steering control module divides the current situation of the car into two cases.
If the car is in a straight track segment or a SAZ (i.e., CT > 70m), the controller steers the car towards the direction of the track axis. That is, if the car is leaning toward the right or left side of the track, the controller forces the car to move towards the centre of the track. In this case, the target steering value is as follows:
where “
If the car is in a corner track segment (i.e., CT ≤ 70m), the BFB3 steers the car towards the direction of the
where
2.3. The weaknesses of the BFB3
Despite the modules mentioned above, we found through a series of observations that the BFB3 is inefficient for two reasons. First, the probability
Secondly, the BFB3 forces the car to move towards the centre of the track if the value of CT is greater than 70m, i.e., the car is in a straight track segment and in a SAZ. However, this strategy does not coincide with a common cornering strategy witnessed in real world racing. Figure 3 shows a variation of trackPos estimated at the part of Wheel1 (573m ~ 688m). The y-axis represents the position of the car, 1 and −1 mean that the car is on either of the edges of the track, and 0 means that the car is in the centre of the track. From 670m, BFB3 considers the track segment as a corner and starts to steer the car towards the direction of the track sensor with the greatest value. As can be seen, the BFB3 maintains the value of trackPos near to 0 up to 670m and changes its trackPos suddenly when it enters into the corner. This strategy causes great danger because a sudden change of steering causes the car to skid. In the real-world racing, it is common knowledge that car drivers should prepare for an upcoming corner by moving the car in the same direction as the corner in advance (see Figure 3).

Variation of probability predicted by ANN during a single lap at Wheel1

Variation of
3. Proposed Methods for a Generalized Controller
In this section, we give a description of our controller based on curvature and learning. First, we describe the modules modified from the BFB3 and then present the learning approach to determine the parameters used in our modified controller.
3.1. The modified modules
To overcome the weaknesses of the BFB3, we modify the two modules - the desired speed module and the steering control module. We improve the desired speed module using the curvature information of a track segment. We first describe how to compute the curvature of the track segment and then present the desired speed module and the steering control module of our modified controller.
3.1.1. The curvature computation
Basically, our computation of the curvature is the same as the one used in [18]. The curvature is computed using the
First, we vectorize the 17 track sensors on the 2D polar coordinate. As they are already known, the angle and scalar (distance) values of each sensor, the
where
After calculating these 17
The curvature of the upcoming corner is represented by the sum of the angles between two adjacent outline vectors.

Descriptions of the computation for curvature information
For more precise computation of the curvature, however, we have to carefully choose the

Variation of curvature information during a single lap at Wheel1
Figure 5 shows the variation of the curvature values computed by the method above during a single lap at Wheel1. As can be seen, unlike the probability variation (Figure 2), the curvature information varies more frequently. For example, when the car is located at between 2324m and 2927m of the track, the curvature values do not exceed 30 but
3.1.2. The modified desired speed module
We utilized the curvature
In the “straight” track segment, two parameters

The desired speed computation for a straight track segment
In an SAZ, the curvature value

The desired speed computation for a SAZ
3.1.3. The modified steering control module
To improve the steering control module of the BFB3, we deal with the “straight” track segment and the SAZ separately, which are processed identically in the BFB3. If the car is considered in the straight track segment, the controller keeps the car at the centre of the track axis, which is the same as the original steering control of the BFB3. Also, if the car is considered in the corner track segment, we do the same control as with the BFB3.
A different case from the BFB3 is when the car is considered in a SAZ. We introduce a new parameter

The modified steering control module
3.2. Learning approach
3.2.1. Evolutionary strategy
Recently, evolutionary computation has been widely used to optimize robotic systems: for example, parallel robotic manipulators [19], motion planning for swarm robots [20] and humanoid motion planning [21]. We tuned the 10 new parameters

Six tracks used to optimize parameter sets
The first method is a “simple” 10+10 ES. Initially, we make 10 individuals for the 1st generation. Each individual consists of the ten parameters and the initial value of each parameter is randomly chosen within the range 0~1. Next, we mutate every individual for the next generation. For each parameter
where
At each generation, 10 new individuals generated by the mutation are evaluated with a single lap completion time. Among 20 individuals (the prior 10 individuals and the new 10 individuals), only the top 10 individuals survive for the next generation (Figure 10). However, the main drawback of this simple ES is that some parameters may increase repeatedly during whole generations.

Overview of the simple and self-adaptive evolutionary strategy

Flowchart for choosing the most general parameter set
The second method is a “self-adaptive” 10+10 ES. This method is the same as the simple ES, except for the mutation operation used. For each individual, we define a weighted value (
Note that, unlike the simple ES, the self-adaptive ES uses the two-stage mutation process (Figure 10).
3.2.2. The generalization of the parameter set
We generalize the parameter set by observing the behaviour of the optimized controller for each track. The parameter sets obtained by the learning strategy are dependent on the tracks and the sets are not practical for the competition. In order to find a general parameter set, we gather 30 generations by extracting 5 generations (the 20th, the 40th, the 60th, the 80th and the 100th) from 100 generations for the 6 tracks. In other words, we have 300 parameter sets in total. For each selected individual, we estimate the lap times for the 6 tracks and compute the sum of the lap time differences from the track-dependent optimized controllers. Among the 300 parameter sets, we choose the parameter set with the minimum standard deviation as the most general parameter set (Figure 11).
4. Experimental results
In this section, we experimentally analyse our controller. First, we compute parameter sets using the two strategies and analyse the performance of the controller with the parameter sets. Next, we generalize the parameter set and analyse the performance of the controller with the generalized parameter set.
4.1. Evolutionary strategy
We obtained 12 track-dependent parameter sets optimized by the two different learning strategies (the simple ES and the self-adaptive ES). We produced 100 generations to search the space. Table 1 shows the values of 12 optimal parameter sets optimized by two learning strategies for the 6 tracks and the lap times of the BFB3. As shown in the table, our controller significantly improved the performance of the BFB3. In addition, the self-adaptive ES found better parameters than the simple ES except for the tracks ERoad and Wheel2.
Results of evolutionary learning
(c) Lab times of three different controllers on the six tracks. (Bold means the best one)
Lap times of track-dependent controllers on the six tracks
Figure 12 shows the comparison of the performance of the simple ES and the self-adaptive ES over 100 generations. The x-axis represents the generation number and the y-axis represents the lap time (in seconds). For CgTrack2 and ERoad, the performance of the two strategies is similar. For Wheel2, the self-adaptive ES is worse than the simple ES during the early generations but the two strategies are almost the same towards the end of the generations. For the other three tracks, the self-adaptive ES is better than the simple ES. Consequently, the self-adaptive ES generally escapes from a local minimum within a relatively short time and, thus, the self-adaptive ES is more suitable for learning the track-dependent optimal controllers than the simple ES.
Table 2 shows the lap times of track-dependent controllers trained by the self-adaptive ES on the 6 tracks. Obviously, the parameter set with the best record (indicated in bold face) for each track is the parameter set trained on the same track. However, the optimal parameter set of a particular track may witness shoddy performance on the other tracks. Specifically, the controller with the parameter set optimized on the track “Cgtrack2” ran the track “Wheel1” in 118.99 sec, which is 28 sec worse than the parameter set optimized on the track “Wheel1.” Therefore, we need to find the general parameter set to make a practical track-independent controller.
4.2. Extracting the general parameter set
We generalized the parameter sets using the method described in Section 3.2. We used the self-adaptive ES for the generalization. Table 3 shows the top 5 individuals (parameter sets) from the search space of the self-adaptive ES whose sum of differences are the least. We denote the name of the parameter set by its generation number and individual number, such as <track name>_<generation number>_<individual number>. In our experiment, the best parameter set was Wheel1_100_8 and the second was Wheel2_20_1.

Variation for the average performance of the simple ES and the self-adaptive ES
Lap times of the track-dependent parameter sets and the 5 most general parameter sets

Comparison of the BFB3, track-dependent controllers and two generalized controllers at the six training tracks and other tracks
Values of the parameter set
(b) Wheel2_20_1.
The values of the parameter set Wheel1_100_8 are given in Table 4. Even though we used the 10 parameters, only four parameters are meaningful. First,
To test the generality of the two parameter sets, we compared the four controllers: the generalized controller with Wheel1_100_8, the generalized controller with Wheel2_20_1, the optimal track-dependent controller, and the BFB3. The performance of each controller was measured upon the elapsed time to complete a single lap and the comparisons were carried out for the 6 tracks, Alpine1, ASpeedWay, Cgtrack2, Eroad, Wheel1 and Wheel2 (see Figure 13). Both the controllers with Wheel1_100_8 and Wheel2_20_1 performed in a manner which was worse on average than the track-dependent controller, but they were always better than the BFB3.
Although these generalized parameter sets are not dependent on a particular track, it may bring out slightly lower performance on several tracks because these parameter sets are still subordinate to the 6 tracks. Figure 13 shows the lap times on twelve TORCS tracks other than the 6 tracks used in the ES. Our controller is not always better than the BFB3 on some tracks which were not considered in our research. On only one track (CgSpeedWay1), the controller with Wheel1_100_8 had the best record among the three controllers. On three tracks (Etrack2, Etrack3 and Ruudskogen), Wheel2_20_1 was the best. On four tracks (BSpeedWay, CSpeedWay, DSpeedWay and ESpeedWay), the lap times of the controllers were almost similar. On the other four tracks, the BFB3 was the best.

Variations for the values of

Two series of screenshots at Eroad
We also compared the steering strategies of our controller and the BFB3 in SAZs. Figure 14 shows the variations of the
Figure 15 shows a series of screenshots of the BFB3 (to the left) and our controller (to the right) at a corner of the track “Eroad”. In Figure 15 b), our controller moved the car to a position closer to the inner side of the track than the BFB3 and it skidded relatively less than the BFB3 (Figure 15 c)). As a result, the BFB3 bumped the car against a track fence but our controller passed the corner smoothly (Figure 15 d)).
5. Conclusion and Future Works
In this paper, we proposed our new controller on the basis of the established controller, the BFB3 [17]. For this work, we modified the original desired speed module and the steering control of the BFB3 and utilized the curvature information. In addition, we defined additional rules and new parameters to improve the desired speed module and the steering control module. Moreover, we optimized the parameters using the simple ES and the self-adaptive ES method, and extracted the most general parameter set from the partial search space of the ES. As a result, our proposed controller can always drive faster than the BFB3 when it comes to the 6 tracks (Alpine1, AspeedWay, Cgtrack2, Eroad, Wheel1 and Wheel2). In addition, it reduces the time and effort for tuning the parameters of the manually designed controller.
Although this generalized parameter set is not dependent on a particular track, it may bring about slightly low performance on several tracks except for the 6 tracks. This is because the generalized parameter set is still subordinate to the set of the 6 tracks. However, we have supposed that if the set of tracks is extended enough, it will be able to obtain the more generalized parameter set through our method. In this work, we use if-then rules to represent the control mechanisms for the autonomous car, but there are several alternatives. For example, a type-1 and type-2 fuzzy logic controller can handle uncertainty and vagueness in the area of robotic control [22, 23]. In particular, the type-2 fuzzy logic controller is promising in relation to the design of accurate control systems, but it requires additional parameters to be optimized [24, 25, 26].
