Designing AI: Solving Snake with Evolution

Published in

Becoming Human: Artificial Intelligence Magazine

10 min readSep 25, 2017

Look at all the little snake-boys go. Keep reading to learn how they learned.

Recently, I’ve become obsessed with artificial intelligence. Specifically, scenarios that enable AI to learn how to accomplish an abstract goal without ever being given labeled training data or explicit instructions.

Artificial intelligence can be an over hyped, misused, and often confusing term for many. Instead of another long rant about how AI will change your life (it will) or steal your job (it won’t) — this article will center instead around a concrete and familiar task:

The Game of Snake

Snake running on Nokia’s popular 3210 — Photo credit: Nokia Mob

Snake has simple rules:

The world is a grid.
The snake can only travel orthogonally along this grid.
This world has a border that kills the snake on contact.
The snake cannot stop moving.
If the snake runs into itself, it dies.
Every time the snake eats, it grows longer.
The goal is to grow as long as possible.

When playing the game, there is a decision to make each time the snake takes a step forward: continue straight, turn left, or turn right.

Our goal is to create an AI to learn how to make this same decision. First assessing the state of the world that the snake lives in, then choosing the move that will keep it alive and continue to grow longer.

The basic loop that occurs for every frame of the game.

Choosing a Method

There are many methods, algorithms, and techniques that could be used to solve Snake. Some of these could fall under the umbrella term of AI. I’m going to focus on a single method: genetic random mutation of a neural network.

This is because:

I don’t have a dataset of high scoring Snake play throughs to use to train a neural network by example.
Personal interest in seeing if it’s possible to evolve logic that can play Snake through only random mutations.

Genetic random mutation of a neural network is likely the most unfamiliar phrase in this article for many readers — so let’s break down what is happening under the hood before going any further.

Neural networks are like a modular synthesizer. A key is pressed, sending an electrical signal through a configuration of circuits designed by the musician to achieve a desired output tone — like a crunchy bass or smooth echoing strings. Photo credit: Muffwiggler

What Is a Neural Network?

A neural network is a kind of algorithm that can be used to determine the abstract relationship between some input data and a desired output. Typically, this is accomplished by training a neural network on thousands of examples. Over time the network will begin to identify the aspects of the input data that are most useful to determine the desired outcome. To achieve this, the neural network slowly adjusts coefficients and weights used in a series of complex formulas to process the input data as it is shown each additional example.

Neural networks come in many shapes, sizes, and varieties: convolutional, recurrent, long-short-term-memory, etc. Designing the right configuration for a given problem can be difficult, confusing, and feel like a bit of a dark art. This is where genetics come in.

What is a Genetic Algorithm?

Instead of picking a network type and then slowly training it based on example Snake gameplay, we are going to create a scenario for one to evolve on its own.

All changes to the neural nets will be random — not through direct feedback of playing the game move-by-move. Overtime, small random changes to the neural networks should lead to a fully functioning AI as the top performers in each generation survive to breed the next.

Our evolutionary process is going to function like so:

Randomly tweak the knobs and cables driving our neural network to create an initial set of unique versions.
Let each of those neural nets play Snake.
After every neural net has finished a game, select which neural nets performed best.
Create a new generation of unique neural networks based on randomly tweaking those top performing neural nets.
Repeat from step 2.

So now we can just relax and let our AI evolve naturally, right? Wrong.

Discover Latest Jobs in AI, ML, Big Data & Computer Vision:

Artificial Intelligence Still Needs a Designer

The genetic algorithm replaces the need for upfront training data, but it is up to us (the designer) to design the larger system that enables this to work. Specifically, we need to choose the input data, output data, and decide what defines good performance in Snake. To channel our synthesizer metaphor from above: we still need to make a keyboard, a speaker, and decide what kind of sound we want to hear.

A decent first approach for our input data is to provide the neural networks with the same information that we have. We play the game by looking at the screen: the colors of the pixels that make up the game’s environment. However, this would require the neural net to form connections that represent nearly every rule of Snake as described earlier. Learning about the walls, where the snake is, its direction, what food is and how to find it. The input data would need to be the color of every pixel of the game: hundreds or maybe even thousands of inputs. This is by no means impossible — but it is a lot more complicated than it needs to be.

Design from AI’s Perspective

Imagine playing Snake in the from a first-person point of view. Be the snake. Give the world some depth and picture yourself making left and right turns to avoid the giant walls of the world and the giant moving “walls” of your body and tail.

Snake from the first-person point of view would look a bit like this old Windows 95 screensaver.

You really only need to know two things to play this version of Snake:

Which direction is the food?
Which direction can you move without dying?

As the neural net forms the connection between moving and dying, the trait needed to avoid both the environment’s walls and the snake’s own body are solved in a single step. Additionally, instead of describing to the neural net where it is and where the food is — we can describe the food as “straight ahead”, “to the right”, “to the left”, or “behind you”. This removes the need for the neural net to understand the size of the environment, where the various objects are within that environment, and even which direction it is heading. By designing from the point of view of our AI, we’ve significantly simplified the problem for our AI to solve.

Don’t Chase the Hype, Solve the Problem

Purists may argue that we’ve somehow cheated by designing out the hard parts — I strongly disagree. Unless your goal is to work towards some level of generalized AI (a solution that could be applied to problems other than just Snake), it’s just another ingredient technology and should be treated as such. Choosing carefully how to use any ingredient will lead to a faster and more understandable solution.

Could you solve Snake using rule-based methods? Of course. We are merely trying to see if given the right conditions, those rules can be developed by random chance.

So where does that leave our design? Resembling something like this:

Our structure for the input and output “layers” of our neural networks.

Each piece of information given to and received from a neural network needs to be between zero and one. To satisfy this, we’ve broken up all of the input data into yes or no questions about each relative direction. It’s important to note that the neural net has no information about what these numbers mean or even that they are in two sets of three. To the AI, it’s just a list of 6 numbers.

Our output data is broken out in a similar way. We are asking our neural net to give us three numbers. We are simply going to pick the highest number and use that to move the snake in the direction described above. The neural net doesn’t start off knowing what these numbers are for — or how they will be used.

We are going to generate many versions of the neural net and allow each to have an attempt at moving the snake around. Those that perform best have made a connection between some aspect of the input and output data that keeps it alive. Over time we are going to continue to tweak those that got the highest scores — and eventually we’ll have an AI that plays Snake.

Defining Good

This brings us to our final design problem: what defines good performance in Snake? This is known as the reward function within the reinforcement learning domain.

A good first step is to replicate the game’s scoring mechanism. The longer the snake grows, the better. This works well enough, but takes a lot of time as it isn’t very obvious to the neural network that steps towards the food are good for it. We are trying to make this problem simple — so let’s start by giving 1 point for each step the snake takes toward the food and 10 points for each time the reaches the food.

Unfortunately, this neural net has found a loophole in our definition of “good performance” and realized that moving in circles is the safest and most efficient strategy. The number in the top left is this neural net’s score. Food is shown as a green dot.

Ba-da-bing ba-da-boom right?

Wrong.

As seen left, our definition of good created a loophole. By turning in circles, the AI is able to repeatedly gain points without dealing with the added risk of getting near walls or bumping into its growing tail. Reward functions are hard to get right and very easy to get wrong. In some scenarios, a human-in-the-loop is needed to say “this is the preferred result” instead of forcing an objective reward function on a subjective problem.

Luckily, this loophole is an easy fix. We can simply adjust our reward function to include a slight penalty for stepping away from the food. Let’s keep 1 point for moving towards the food and now subtract 1.5 points when moving away. Moving in circles will now create a negative score over time.

Evolution That Can Be Witnessed

Finally, we need to decide the size of each generation and how many neural nets will be chosen to breed the next. Since I want to make AI that is accessible and easy for anyone to witness, I need this to be able to run smoothly in just a web browser. This means picking a relatively small population size. Since a 5x11 grid fit well in my browser, I’ve chosen 55 — with the top 6 (about 10%) used to breed the next generation.

Each generation will last until all 55 have died or those that remain all have negative scores. This causes the first few generations to be very short, until the mutation of “turn if something is in front of me” occurs and is passed down to subsequent generations.

From knowing nothing — to a decent strategy, in 32 generations. If a snake dies, it flashes red. After a dying, it remains grey for the rest of the generation. The top 6 performers are highlighted in blue before breeding the next generation.

Since we simplified the problem and carefully thought through the design of our AI, it doesn’t take long for neural nets to evolve to meet our performance criteria. After a few dozen generations— improvement levels off significantly. Each generation still has some mutations that aren’t helpful, as a few snakes turn in circles generating lower and lower scores while the top performing snakes push generations to last longer and longer.

Each generation contains 55 different neural nets. The score of each is based on the reward function described above.

After 100 generations, we can see that the evolved strategy actually requires the neural nets to lose points at times. Instead of taking the fastest route, they move away from the food to reposition and safely get their bodies out of the way. This is surprising since the neural nets have no sense of the overall size of their environment and can only “see” one space ahead. How this trait evolved is puzzling and while every attempt at re-evolving will generate a different result — a variation of this behavior seems to always emerge eventually.

After about 5 hours of playing and evolving, our snakes reached their 100th generation.

Try not to be distracted by the AI hype, shifting industry definitions, or bad marketing campaigns from big tech. Focus on the task at hand and diligently describe what you are expecting your AI to do.

AI can be a powerful solution for many problems — but it still needs a designer. Good designers think diligently through complex problems and walk a mile in their user’s shoes. Designing AI is no different.

Credits

Many thanks to Thomas Wagenaar, creator of Neataptic.js. This project relies heavily on his open source work and likely wouldn’t exist without it.

Neataptic.js - Home

Neuro-evolution on steroids, right in the browser

wagenaartje.github.io

Want to try this in your browser?

Tweak, tune, and evolve your own Snake-playing AI — no software required.

Train AI to Play Snake in Your Browser

Recently, I wrote an article about my process for designing AI that could learn to play the game of Snake on its own…

towardsdatascience.com

Any Questions?

Find me on Twitter @peterbinggeser.