Evolving Steering Behaviours with
Daniel Vogel / University of Toronto
Course project for CSC 2521 Artificial Life
This report describes a genetic programming system and experimental results for evolving steering behaviours. The system was developed as a flexible C++ tool kit for prototyping a range of genetic programming problems. For this project, the system was used to solve five steering behaviour problems including simple behaviours for individuals and pairs (Seek, Flee, and Pursue); and combined behaviours for groups (Tag and Corral).
§ System Executable (Windows/OpenGL ZIP Archive)
The inspiration for this work comes from a series of papers by Craig Reynolds in which he described the evolution of steering behaviours like corridor following, obstacle avoidance, and playing a game of tag using genetic programming [7, 8, 10]. In this project I not only produce similar results to his 1994 paper, Competition, Co-Evolution and the Game of Tag, but also evolve a coordinated steering behaviour involving three agents which I call Corral.
Craig Reynolds is perhaps best known for his 1987 work on flocking behaviours commonly referred to as Boids . This work later became a subset of many steering behaviours he developed for autonomous agents . Reynolds defines steering as the middle layer of a three layer motion behaviour model (see Figure 1). Whereas locomotion is concerned with control signals to move the agent in a desired direction at a certain speed; and action selection is concerned with signals to move the agent towards a high-level goal; steering is a locomotion-independent control signal used to help achieve an action-selection goal.
Figure 1. A Hierarchy Of Motion Behaviours
Steering behaviours can be divided into two groups: simple behaviours for individuals and pairs; and combined behaviours for groups. A taxonomy of the steering behaviours presented by Reynolds appears in Figure 2 . Seek, flee, Pursuit, and Evasion form a base for many others. For this reason, Seek, Flee and Pursuit were selected as initial behaviours to evolve in this project. The other two behaviours evolved in this project, Tag and Corral, don't appear in the taxonomy; however, they are essentially higher-level combinations of those base steering behaviours.
Figure 2. Taxonomy of Steering Behaviours.
Genetic programming is an application of genetic algorithms to computer programs. An overview of genetic algorithms can be found in the Genetic Algorithms FAQ . While genetic algorithms typically use a fixed number of genes in the evolved genotype, genetic programming uses a variable length program fragment as the genotype. A convenient format for the program syntax is a functional language like LISP where functions and terminals form a tree structure and evaluate to a single value. During evolution, the usual operations are performed on an individual program fragment: crossover breeding between two individuals and mutation being the most popular. Langdon provides a simple overview of genetic programming , Langdon and Qureshi give more detail and compare it to other search techniques . Finally, Koza's 1992 book provides an in-depth discussion . Karl Sims well-known work in the evolution of creatures and aesthetic images provide additional examples of the application of genetic algorithms and genetic programming to solving related problems [11, 12].
My approach was to design and develop a genetic programming tool kit to test and visualize different fitness functions and evolutionary parameters. I wanted to begin with learning simple steering behaviours “ partly to test the tool kit and genetic operators “ then build on these to learn more complex group co-evolution and coordinated behaviour.
Before evolving the steering behaviours themselves, I created a flexible simulation system with basic visualization functionality, code to parse and evaluate LISP-like functions, and implement evolutionary operators. These modules were combined into a tool kit of sorts. I say of sorts, since the final tool kit still has shortcomings in its modularity and general implementation clarity ”'™s very much prototype research code at this point, but with some clean up, I plan to make it available to others.
The tool kit is written in C++ with OpenGL, GLUT, and GLUI for viewing and interface. I use an enhanced GLUT available from Dr Robert Fletcher which provides control over polling of new events with a call to glutCheckLoop(). This is to eliminate relinquishing control of the event loop to GLUT with the typical glutMainLoop() call. The prototype application's user interface provides a way to visualize sample simulation runs of the evolved steering behaviour in addition to controlling basic evolutionary parameters of the tool kit (see Figure 3). The current user interface has limited control of all genetic parameters which are adjusted directly in the code.
Figure 3. Prototype Application. (a) simulation environment display. (b) evolved agent local-coordinate view. (c) evolved agent perceptual field. (d) evolution parameters. (e) debug and visualization control.
Kuhlmann and Hollick provided an introductory overview of genetic programming using C/C++ and provided some code. Using their code as a rough pattern, I wrote my system to be more object-oriented and flexible so that I could easily plug-in different problems and functions.
The tool kit has five major classes:
Problem Main class to evolve, visualize, and simulate general problems..
Population Creates and manages the population of individuals and applies evolutionary operators.
Node Generate and evaluate LISP-like functional programs.
Func The LISP-like operators and terminals.
Vehicle Compute and display the vehicles used in the simulations.
This class handles creating the random population of initial individuals, performs the two genetic operations of crossover and mutation, and compiles basic statistics about the population used for analysis.
Since genetic programming typically uses LISP-like functional code, the system needed a way to generate and evaluate this syntax. An n-ary tree data structure is used with each node acting as a function, constant or variable. Specifically, a Node object has a pointer to a Func object and every function, constant, and variable is inherited from the Func abstract base class. A factory object called FuncChooser, provides a way to generate a function, constant or variable either completely randomly, randomly chosen with certain arity, randomly chosen with arity greater than a specified size, or by selecting a specific function.
Each problem is inherited from the abstract base class Problem which provides basic evolution, initialization, visualization, and simulation playback control. A new problem usually needs to implement three virtual methods: InitFunctions() to load FuncChooser with the functions, constants, and variables used for the problem; Render() to render a trial step to the screen; and Fitness(Node*) to evaluate the fitness of a certain individual during a simulation.
For evolution, I tried two techniques: survival of the fittest and steady-state. My initial prototype used the survival of the fittest strategy implemented by Kuhlmann and Hollick with each generation being evolved together en mass. This technique sorted the population according to fitness, killed off a percentage of the least fit individuals, and crossbred new individuals to replace those killed. Pairs of new individuals were birthed from consecutive pairs of top performing individuals with each pair producing two crossbred offspring. The middle performing individuals stayed as they are. Without a mutation operator, this strategy tended to exploit rather than explore the population and resulted in sub-optimal results “ it seemed to get caught in local minima when an adequately performing individual's genes dominated the entire population.
For all the results discussed in this report, I used the steady-state evolutionary technique as described by Reynolds . In this technique, two individuals (parents) are selected from the population using tournament selection. A tournament size of 7 was used for each parent with the most-fit being selected. Using crossover and mutation, the two parents birthed a new individual (their child). To make room in the current population for the child, an individual was selected for removal using a hybrid of tournament and uniform selection. A tournament size of 7 was used with the least-fit individual selected half the time, and the other half of the time an individual was selected uniformly across the entire population. As Reynolds discusses, the steady-state technique not only promotes diversity through exploration, but there is also some agreement it accelerates convergence to a solution .
The addition of a mutation operator promotes exploration. My system mutates the new child with a probability 0.1. I use a simple mutation operator which picks a node in the individual and randomly replaces the node with a new function of matching arity. This is not the most robust implementation of mutation, but it should assist the exploration of the problem space.
Koza outlines five preliminary steps to solving a problem using genetic programming : choosing terminals, choosing functions, designing a fitness function, setting control parameters, and establishing termination criterion. I selected a general set of functions and terminals used for all of my experiments, with additional functions and terminals used for particular problems. Each fitness function was designed for a particular problem. For control parameters, I limit the length of program segments to be no more than 32 statements during initial generation and no more than 64 statements during genetic operations. The termination criterion was defined simply as the number of generations to evolve.
All experiments evolve a functional program to steer an abstract autonomous agent within an environment. The evolved steering behaviour attempts to achieve a predefined goal by maximizing a fitness function ranking how well the agent performs during a simulation in the environment. In my framework the following are expressed as a single genetic programming problem within the Problem class: a goal with its associated environment, inhabiting abstract autonomous agents, and genetic programming parameters.
My choices of agent vehicle, steering input, and function program input variables are again modelled after those in Reynolds's Co-Evolution Tag paper .
All problems use abstract autonomous agents in the form of frictionless, two-dimensional vehicles moving at a constant velocity (see Figure 4). The Vehicle class defines a vehicle with a location, velocity, and heading. The vehicle's heading is in units of revolutions, a real valued range between 0 and 1. For example, a heading of 0.5 corresponds to 180 degrees or PI radians.
Figure 4. Abstract
Autonomous Agent Vehicle.
The stick represents the direction and velocity vector.
For evolved agents, their functional program determines the steering angle applied at the current time step. A new heading is calculated by adding the input steering angle to the current heading using a wrapping function. The wrapping function keeps the new heading in the desired normalized revolutionary range. Since there is no restriction on the steering input, the vehicles essentially have no momentum and can turn instantly to face any direction in a single time step.
The primary input to an agent's functional program is the location of other agents in the environment. Two variables, local-x and local-y express the integer position of another agent in the current agent's local coordinate system (see Figure 5). Note that the other agent's heading is not relevant, since without momentum a vehicle can turn instantly.
Figure 5. Local
Coordinates. (a) Global
coordinates of environment.
(b) Local coordinates of blue agent with respect to red agent's position
All experiments selected from the following standard functions for use in the evolved program:
(+ a b)
a plus b
(- a b)
a minus b
(* a b)
a times b
(% a b)
b=0 then 1 else a divided by b
(min a b)
if a<b then a else b
(max a b)
if a>b then a else b
(iflte a b c d)
<= b then c else d
Constants were generated uniformly to be real numbers between 0 and 1.
The seek behaviour steers a vehicle to a specified target position in global space. See Reynolds's steering behaviour paper for a more detailed description . The optimal strategy is to steer the vehicle so that'™s heading directly towards the target.
The environment consists of two vehicles, the evolved agent in red and the stationary target in blue.
All standard functions and constants described above were used with the addition of:
position of target in local coordinates
Maximize the number of time steps where the target is within 2 vehicle-lengths.
The average fitness was calculated by running 10 simulations, each simulation consisting of 25 time steps. Each simulation was initialized with the target placed at the origin and the evolved agent placed randomly in an area shaped like a disk. The evolved agent was initialized with a random heading and assigned a constant velocity of one vehicle-length. At each time step of the simulation, the agent's steering angle was computed using the evolved program and the position of the agent updated.
During this run, the starting disk area for the evolved agent had a 5 vehicle-length inner radius and a 10 vehicle-length outer radius. The score wasn't normalized and the score didn't take into account the agent's starting location. Thus, if an agent started very close to the target on many trials, its average score would be higher than an agent starting farther. In the next run below, the score was normalized, the initial position taken into account, and the threshold for closeness decreased to one vehicle-length.
A quick inspection of the initially generated individuals showed various behaviours like looping, stumbling, and straight-line running (see Figure 6). There were even some individuals, like individual 3 that actually moved in a straight line directly away from the target.
Figure 6. Initial Individuals. From left to right, individuals 3, 1, and 10.
Even after 1 generation (100 individuals bred), some promising behaviours started emerging (see Figure 7). Some agents depended solely on the local-y variable like individuals 174 and 182, while others like individual 142 formed complex expressions that amounted to stumbling towards the target.
Figure 7. Individuals after 1 generation. From left to right, individuals 174, 182, and 142.
The population was then evolved for a significant time, 60 generations. The graph in Figure 8 summarizes the best fitness, average, and worst fitness during evolution. It shows a slow trend upwards, with the biggest jump occurring in the first 6 generations. This may be due to the relative simplicity of the problem. Note the odd jump in worst score at generation 23: the standard deviation at this generation was only 0.622 indicating that all individuals were performing quite well. The random nature of the evolutionary process would not be likely to maintain such a high mean fitness score for long
Figure 8. Fitness Graph
After the 60 generations, individuals still maintained the stumbling behavior, but seemed to be more consistent at staying close to the target after reaching by employing a tight turning radius after arriving. An examination of an individual's steering force given the local coordinates of the target is easily visualized using a perceptual map . For example, the perceptual map for individual 6008 in Figure 9 shows the simplistic strategy of a single steering angle depending only on which side the target is. Generally, this visualization with a perceptual map is more useful than attempting to interpret the generated program code, which is also shown below for completeness.
Figure 9. Individual 6008. (left and centre) Two runs illustrating the strategy of quick stumble, then tight circle. (right) the perceptual field map. Note the very simple strategy where the direction do double duty as stumble towards, and circle around.
Evolved code for individual 6008 with fitness 18.3:
(* (abs (- (% (iflte (iflte (% (max (+ (min local-y 0.828059) local-y) (iflte (iflte (% (max local-x local-y) 0.828059) 0.458876 local-x 0.623676) 0.458876 0.623676 local-x)) local-x) local-y local-x (max (+ (min 0.828059 local-y) local-y) (iflte (iflte (+ (max 0.458876 0.293619) local-x) 0.948424 0.828059 local-x) 0.458876 (- (max (+ local-x (iflte local-x 0.623676 0.623676 local-x)) 0.828059) 0.828059) local-x))) 0.458876 0.00952177 local-x) local-x) 0.828059)) 0.927396)
The disk area was increased to a 10 vehicle-length inner radius and 15 vehicle-length outer radius. The fitness score was normalized with the initial distance from agent to target taken into account. Also, the threshold for closeness to the target was decreased from 2.0 to 1.25 vehicle-lengths. The hope is that the difficulty of the task will be increased and at the very least, more diverse behaviour found.
After 30 generations, the evolved strategy appeared to be more efficient with significantly less stumbling during a more direct approach to the target and, upon arrival, a tighter turn radius (see Figure 10). The perceptual map shows that the strategy is taking into account which quadrant the target is in, rather than simply whether it is on the left or right as in the previous run. Examination of the fitness graph shows a similar pattern to the first run with the largest jump in best and average fitness occurring in the first 10 generations, then levelling off (See Figure 11). A slight upwards trend is apparent in the mean fitness score indicating that the successful strategy of the top individuals spread throughout the population.
Figure 10. Individual 10644. (left and centre) Two runs illustrating the strategy with a more efficient path to the target, and then a very tight spin afterwards. (right) the perceptual field map.
Evolved code for individual 10644 with fitness 1.0276:
(abs (iflte (min 0.943571 (min (abs local-y) 0.493088)) local-y (- 0.943571 local-x) (+ (- local-x (min (abs (max (+ (- (iflte (min 0.493088 (* (abs local-x) 0.943571)) 0.493088 (max 0.853633 local-y) 0.368267) (min local-y 0.943571)) 0.455123) (- (- 0.368267 0.493088) 0.77337))) 0.493088)) 0.943571)))
Figure 11. Fitness Graph. NOTE: I had some small error with the normalization evident by the fitness values greater than 1.0.
The flee behaviour is the opposite of seek, with the agent steering away from a target. The optimal strategy is to steer the vehicle so that it points directly away from the target.
As in seek, the environment consists of two vehicles, this time the evolved agent is in orange and the stationary target remains in blue. The evolved agent has a velocity of one vehicle-length per time step. The number of trials was increased to 20 with the number of time steps per simulation remained at 25. The agent was initialized at a random location on a 2.0 vehicle-length diameter circle.
All standard functions and constants described above were used with the addition of:
position of target in local coordinates
Maximize the sum of the squared distance from the agent to the target across all time steps in the simulation. A normalized score was calculated which also takes into account the starting distance from the agent to the target.
The average fitness was calculated in the same way as seek, but using the flee fitness function.
Initial random strategies showed some promise, but were far from optimal (see Figure 12). For example, individual 197, always travelled in a straight line, 32 stumbles as it turns away from the target, and 73 has a slight curve as it slowly turns away.
Figure 12. Initial Individuals. From left to right, individuals 197, 32, and 73.
The population was evolved for 50 generations. Examination of the best individual behaviour unfortunately wasn't the most optimal (see Figure 13). The optimal behaviour requires a very accurate, immediate turn so that the agent is facing perfectly away from the target. Individuals simply proceeded in a straight line if the target was anywhere behind it, and when the target was in front, it made close to a 180 degree turn.
Figure 13, Individual 15223
Evolved code for individual 15223 with fitness 0.91632:
(iflte local-y (iflte (* (max (abs (* (* (max (abs (* (* local-x (+ local-x (min (- (iflte (+ local-x 0.539201) local-y 0.539201 local-x) local-x) (+ (abs (abs 0.704123)) local-x)))) (max (* (+ (* local-y (abs local-x)) 0.739677) local-y) (% 0.0250557 local-x)))) (abs local-y)) local-y) (iflte (* (abs local-y) local-x) local-x local-x local-y))) local-x) 0.916715) local-x local-x 0.0250557) local-x 0.539201)
Looking at the fitness graph, the best fitness seems to level out at just over 0.9 after just the first 4 generations (see Figure 14). A better designed fitness function may help the population find a better strategy. Ideas for better design include increasing the start position diameter, always pointing the agent towards target, and randomizing the target position.
Figure 14. Fitness Graph
Pursuit is similar to seek, except that the agent is trying to remain near (or on) a moving target (the prey). An optimal strategy is to seek to a position ahead of the target “ where the target is most likely to be in a certain amount of time.
The average fitness was calculated by running 5 simulations, with a simulation consisting of 50 time steps. Each simulation was initialized with the target placed at the origin and the evolved agent placed randomly in a disk with a 5 vehicle-length inner radius of and a 10 vehicle-length outer radius. The evolved agent was initialized with a random heading and assigned a constant velocity of 1.25 vehicle-lengths, and the prey initialized at the origin with a velocity of 1 vehicle-length. As before, at each time step of the simulation, the agent's steering angle was computed using the evolved program and the position of the agents updated.
As before, the environment consists of two vehicles, the evolved agent in green and the stationary target in blue. The evolved agent wandered around the environment. Wandering was achieved by selecting a normally distributed steering input between -0.02 and 0.02, and with a prob of 0.025, choosing a large steering angle. This reduced the twitching of the agent, and kept it moving in a general direction. A better solution may have been to implement Reynolds's wander steering behaviour .
All standard functions and constants described above were used with the addition of:
position of target in local coordinates
Maximize the sum of the squared distance from the agent to the target across all time steps in the simulation.
The average fitness was calculated in the same way as seek, but using the Pursue fitness function.
The best individuals from the initial generation exhibited behaviours with some promise. For example, they seemed to generally track the movements of the prey, but the prey was rarely caught.
Figure 15. Early Individuals. From left to right, individuals 929 and 1053.
After an additional 20 generations, a slightly robust behaviour emerged (see Figure 16), but the top score did not improve significantly with an increase from about 3 to just over 9. The mean had levelled out around generation 15 (see Figure 17).
Figure 16. Mature Individuals. From left to right, individuals 929 and 1053.
Figure 17. Fitness Graph
In an attempt to improve the learned behaviour, the fitness function was improved so that the fitness space was more continuous encouraging agents to find the optimal solution.
The individual's fitness score was calculated as follows: At each time step, if the pursuer was within 10 vehicle-lengths of the prey, they were awarded a score between 0 and 1 as a linear function of their distance to the prey. In addition, a bonus of 1 was added if they were within 2 vehicle lengths of the prey. This two-stage scoring scheme allows random pursuers to more easily learn that staying close to the prey increases fitness, and staying very close to the prey is optimal. Other parameter changes included increasing each simulation to 40 time steps, running 8 trials, and reducing the purser's position to lie on a 3 vehicle-length circle around the prey. To aid in visualizing the fitness of the pursuer, a gold coloured square surrounds the pursuer if they were within 10 vehicle-lengths of the prey and a pink square around the prey of if they were within 2 vehicle-lengths.
The initial population showed some behaviours like moving in a series of small arcs near the prey, and periodic straight line runs interspersed with tight spins (see Figure 18).
Figure 18. Initial Individuals. (left) 392 (right) 242
After evolving for 40 generations, the agent's behaviour improved to the point where it followed the prey quite well (see Figure 19). Not entirely unsurprising, the perceptual field of top-performing individuals (Figure 20) resembled the field using by agents to solve the seek problem (Figure 10).
Figure 19. Mature Individuals. From left to right, individuals 19817 and 19976.
Figure 20. Perceptual Field for individual 19817.
The fitness graph shows that the best performing individuals were found around generation 19 (see Figure 19). So, although the evolved strategy is similar to seek, the difficulty of evolving a successful solution is greater since the prey is not simply a stationary target.
Figure 21. Fitness Graph
I re-implemented a portion of the 1994 work by Reynolds . Cliff and Miller provide another example of co-evolution of pursuit and evasion relevant to the tag-playing problem .
I adopted the same fitness function as Reynolds: an agent tries to maximize the number of time steps where they are not. This is an interesting function considering that the population is co-evolving. One should expect the mean fitness to actually decrease to 50% on successive generations as the population becomes more efficient as a whole.
The average fitness was calculated by running 6 simulations with 4 randomly chosen opponents from the population for a total of 24 simulations. Each simulation ran for 25 time steps. The opponent's positions were initialized within an 8 vehicle-length square with random headings. The agent that was it had a velocity of 1.25 vehicle-lengths and the not agent assigned a velocity of one vehicle-length. During the 6 simulations with a selected opponent, the player starting as it was alternated each time. As before, at each time step of the simulation, the agent's steering angle was computed using the evolved program and the position of the agents updated. An opponent was tagged if it came within 1.5 vehicle-lengths of it. I didn't normalize the fitness score so it has a range of 0 to 25.
All standard functions and constants described above were used with the addition of:
position of opponent in local coordinates
(if-it a b)
if agent is it do a, else do b
The code fragments were generated and evolved such that the if-it function always remained at the root of the program tree. No other occurrences of if-it appeared in the rest of the code. This follows Reynolds's technique for his most successful run -- he discusses other experiments without this constraint in his paper .
A larger population was selected to increase the chance of a good strategy being found. As stated earlier, the best fitness of an individual shouldn't increase significantly, since the entire population is evolving together. A graph of fitness values illustrates this trend (see Figure 22).
Figure 22. Fitness Graph
After 30 generations, individuals developed reasonably good strategies where most games resulted in fairly even pairings (see Figure 23).
Figure 23. Back and Forth Games. If individuals like 27778 don't quickly turn and run after making a tag, there is a good chance they'll be re-tagged again. A successful strategy requires a quick about face after making a tag.
Figure 24. Individual 29200. (left) 29200 successfully pursues the opponent, makes a tag, then escapes. (centre) 29200 avoids getting tagged. (right) another successful tag and escape by 29200
Evolved code for individual 29200 with fitness 18.417:
(if-it (min (- (min (+ local-y 0.485488) (- (min (+ local-y 0.485488) 0.0715659) local-x)) local-x) 0.879604) (iflte (iflte local-y (- (abs local-y) local-y) local-x 0.703787) (- (abs (min local-y local-y)) local-y) (min local-x (+ local-y (abs local-x))) 0.703787))
Figure 25. Perceptual Field Map of 29200. (left) When 29200 is it. (right) when not
Figure 25 shows the perceptual fields for individual 29200 when it and not. When it, the arrows in the bottom two quadrants show how the agent will pursue the opponent by turning around. The arrows in the top two quadrants illustrate 29200's technique of wobbling back and forth to pursue its opponent. When not it, the arrows in the bottom two quadrants predominately point upwards showing how 29200 runs from the opponent. The arrows in the top quadrant show how 29200 makes a sharp left turn if the opponent is in front.
The corral behaviour is a more complex steering behaviour involving three agents. The goal is for a faster individual (the wrangler) to assist a slower individual (the pursuer) in catching prey. This scenario could occur where an experienced wrangler is teaching a younger pursuer how to hunt. Or, the wrangler could be an agile but relatively weak member of a hunting pack and the pursuer relatively slow, but strong enough to actually kill the prey. Corral can be thought of as a higher-level steering behaviour encompassing seek, flee, and offset pursuit (see Reynolds's for an explanation of offset pursuit) .
The environment consists of three vehicles, the wrangler in purple, the slower agent in cyan, and the prey in yellow. The agents have the following constant velocities as multiples of one vehicle-length: wrangler 1.25, pursuer 0.75, and prey 1.0. The wrangler is the evolved agent whereas the pursuer and prey use optimal seek and flee behaviours respectively. The prey flees from either the wrangler or the pursuer, depending which is closest.
All standard functions and constants described previously were used with the addition of:
position of the prey in the wrangler's local coordinates
position of the pursuer in the wrangler's local coordinates
The fitness function essentially was to minimize the time required for the slower pursuer to catch the prey. Since this fitness function results in many null scores, I added a small bonus if the pursuer was close to catching the prey. This ranged from 0 to 1 based on the improvement in distance between pursuer and prey on the last time step compared to the first time step. The hope was that this would indicate that some corralling occurred during the simulation even though the pursuer didn't catch the prey. The final value for fitness was expressed as the total possible simulation steps minus the number of steps taken for the pursuer to catch the prey plus the bonus for distance. Thus, larger scores represent better performance with the highest possible score being the number of simulation steps plus one.
The average fitness was calculated by running 5 simulations with each simulation consisting of 50 time steps, or less if the prey was captured. A simulation was initialized with all three agents placed randomly in a 20 vehicle-length square with random headings. To increase the uniformity of the trials, the prey and pursuer where positioned so that they were at least 2 vehicle-lengths apart. At each time step of the simulation, the wrangler's steering angle was computed using the evolved program and the position of the agents updated accordingly. If the pursuer came within 1.5 vehicle-lengths of the prey, the prey was considered caught and the simulation ended.
After only 10 generations, some reasonably successful behaviour emerged with the mean fitness reaching 8.15 and best fitness 34.1 (see Figure 27). The wrangler would attempt to outflank the prey and steer it towards the pursuer (see Figure 26 left). The efficiency of the wrangler however, was less than optimal with many bouts of confusion resulting in simulations were the prey wasn't caught (see Figure 26 right).
Figure 26. Individual 2572. The left shows a successful corral and the right shows where 2572's technique fails.
Evolved code for individual 2572 with fitness 21.874:
(- (+ (% (min (iflte (min (% (max (+ (* helper-local-x (iflte local-x (* helper-local-x (min (min helper-local-y (+ helper-local-y (+ helper-local-x local-y))) local-x)) 0.245582 local-y)) local-x) helper-local-y) 0.373089) local-y) helper-local-x local-y local-y) 0.0556963) helper-local-y) helper-local-y) helper-local-x)
Figure 27. Fitness Graph
After 30 generations, the mean fitness increased to 10.65 and the best agent had a fitness score of 34.31 as the population of agents became more adept at corralling the prey. In some cases the flanking behaviour had become much cleaner and fluid with an additional sub-strategy of criss-cross and confuse: the wrangler would dart back and forth in front of the prey to force it to switch direction quickly slowing'™s progress (see Figure 28).
Figure 28. Individual 7333. Two successful corrals showing 7333's surround technique.
Evolved code for individual 7333 with fitness 34.318:
(abs (% 0.289254 (max (% (abs (max (% (abs (max local-y 0.447554)) local-y) 0.447554)) (max (iflte (+ (* (abs (max (% (abs (% local-x local-y)) (max 0.530107 helper-local-x)) 0.447554)) (min (* (max local-y 0.447554) (abs (* (* local-y helper-local-x) helper-local-x))) 0.296396)) helper-local-x) (* (abs (max local-y helper-local-y)) 0.530107) local-y helper-local-y) helper-local-x)) (max local-y 0.447554))))
Figure 29. More Examples After 40 generations.
After evolving 41 generations, the basic strategy didn't improve significantly statistically -- the mean fitness seems to have actually peeked around 14 generations with a value just below 11. Note that the evolutionary step allows the most-fit individual to be killed with some small probability, which explains the decline in best fitness at generations 18, 27, 34, and 41. This could be tuned to reduce this probability, but allowing this event promotes exploration of the evolved solutions.
By increasing the population to 1000, I expected the evolved behaviour to be more robust and diverse since there would be a larger set of genetic material to draw from. I also expected the larger population to converge more slowly. After 20 generations, the best fitness did not increase above those in the smaller population, but the mean fitness had increased to 16.7 (see Figure 30).
Figure 30. Fitness Graph
Some slightly improved behaviour was noticeable, for example individuals tended to loop right around the prey. This would effectively bring the prey to a standstill, as the prey's flee behaviour would continually adjust the steering direction keeping it in a small space (see Figure 31 and Figure 32). The perceptual field is more difficult to interpret in this behaviour than in previous cases, since'™s dependent on four variables: the local coordinates of the prey and the local coordinates of the pursuer. For this reason I haven't included it.
Figure 31. Individual 10017. This individual uses an interesting technique where it erratically moves all around the prey in an attempt to slow'™s progress.
Evolved code for individual 10017 with fitness 34.933:
(+ 0.147282 (min (+ (* (abs (iflte (min (+ (% (abs (iflte (% (* local-y helper-local-y) (* (min (* helper-local-x local-y) (iflte local-y local-y 0.266762 (max local-y local-y))) local-y)) 0.980255 helper-local-y local-y)) local-x) 0.634785) helper-local-y) helper-local-y helper-local-y (* (* (max 0.195471 helper-local-x) (* local-y 0.980255)) (abs (* local-y (* (min local-y helper-local-x) 0.875546)))))) local-x) 0.634785) helper-local-y))
Figure 32. Individuals 7231 (left) and 7603 (centre and right). These individuals use a combination of the erratic movement, with the outflank technique. 7603 uses a series of small arcs to outflank and, on the right, small arcs combined with looping to also stall the prey's progress.
Evolved code for individual 7231 with fitness 30.897:
(+ 0.266762 (min (+ (* (abs (iflte (% (min helper-local-y helper-local-x) (* (min (* (* local-y (* (min (* (* local-y helper-local-y) local-y) helper-local-x) 0.875546)) (min (abs helper-local-x) helper-local-x)) helper-local-x) local-x)) (abs local-x) helper-local-y helper-local-y)) local-x) 0.634785) 0.9588))
Although the experimental results showed that genetic programs can be evolved to solve a range of steering behaviours, many of the solutions are far from optimal. For example, the perceptual field diagrams for Tag show how the evolved steering inputs given the local coordinate position tend to be very coarse and sudden. I believe that this is due to the extremely simple, frictionless vehicle model. Another factor may be that the fitness functions don't measure the style of solutio€“ for example, adding a penalty for large steering angles could help to reduce the coarse steering behaviour.
The evolved population also lacked the range of diversity I had hoped with a single strategy tending to quickly dominate the entire population. My hope was to have several relatively distinct individuals all employing radically different techniques to solve a given problem. I did find that with larger populations, more diversity was evident, as seen in the Tag problem. I also expect that spending more time tuning the genetic programming parameters for mutation rate and uniform versus tournament replacement selection may help increase diversity.
Finally, while debugging and testing different fitness functions, it became apparent how important visualization of the simulation is. Producing trails, viewing the perceptual field, and viewing the local coordinate space of the individual all helped to identify shortcomings and advantages of different fitness functions.
Below is a list of ways to extend and improve the work:
§ spend more time tweaking parameters of the genetic algorithms and testing different fitness functions.
§ evolve agents which exist in 3D space
§ add obstructions and/or restrict the extents of the environment
§ adding style points to fitness functions like rewarding gentle steering angles
§ add momentum and mass to physically modelled environment
§ reduce evolution times by adding stopping criteria.
§ refine the tool kit into a state where the source-code could be made available to others to test and prototype evolutionary solutions using genetic programming
§ use genetic programming to evolve higher level steering behaviours by using the output to select between a set of possible low-level steering behaviours at a single time step