Designing AI: Solving Snake with Evolution

Recently, I’ve become obsessed with artificial intelligence. Specifically, scenarios that enable AI to learn how to accomplish an abstract goal without ever being given labeled training data or explicit instructions.
Artificial intelligence can be an over hyped, misused, and often confusing term for many. Instead of another long rant about how AI will change your life (it will) or steal your job (it won’t) — this article will center instead around a concrete and familiar task:
The Game of Snake

Snake has simple rules:
- The world is a grid.
- The snake can only travel orthogonally along this grid.
- This world has a border that kills the snake on contact.
- The snake cannot stop moving.
- If the snake runs into itself, it dies.
- Every time the snake eats, it grows longer.
- The goal is to grow as long as possible.
When playing the game, there is a decision to make each time the snake takes a step forward: continue straight, turn left, or turn right.
Our goal is to create an AI to learn how to make this same decision. First assessing the state of the world that the snake lives in, then choosing the move that will keep it alive and continue to grow longer.

Choosing a Method
There are many methods, algorithms, and techniques that could be used to solve Snake. Some of these could fall under the umbrella term of AI. I’m going to focus on a single method: genetic random mutation of a neural network.
This is because:
- I don’t have a dataset of high scoring Snake play throughs to use to train a neural network by example.
- Personal interest in seeing if it’s possible to evolve logic that can play Snake through only random mutations.
Genetic random mutation of a neural network is likely the most unfamiliar phrase in this article for many readers — so let’s break down what is happening under the hood before going any further.

What Is a Neural Network?
A neural network is a kind of algorithm that can be used to determine the abstract relationship between some input data and a desired output. Typically, this is accomplished by training a neural network on thousands of examples. Over time the network will begin to identify the aspects of the input data that are most useful to determine the desired outcome. To achieve this, the neural network slowly adjusts coefficients and weights used in a series of complex formulas to process the input data as it is shown each additional example.
Neural networks come in many shapes, sizes, and varieties: convolutional, recurrent, long-short-term-memory, etc. Designing the right configuration for a given problem can be difficult, confusing, and feel like a bit of a dark art. This is where genetics come in.
What is a Genetic Algorithm?
Instead of picking a network type and then slowly training it based on example Snake gameplay, we are going to create a scenario for one to evolve on its own.
All changes to the neural nets will be random — not through direct feedback of playing the game move-by-move. Overtime, small random changes to the neural networks should lead to a fully functioning AI as the top performers in each generation survive to breed the next.
Our evolutionary process is going to function like so:
- Randomly tweak the knobs and cables driving our neural network to create an initial set of unique versions.
- Let each of those neural nets play Snake.
- After every neural net has finished a game, select which neural nets performed best.
- Create a new generation of unique neural networks based on randomly tweaking those top performing neural nets.
- Repeat from step 2.
So now we can just relax and let our AI evolve naturally, right? Wrong.
Discover Latest Jobs in AI, ML, Big Data & Computer Vision:
Artificial Intelligence Still Needs a Designer
The genetic algorithm replaces the need for upfront training data, but it is up to us (the designer) to design the larger system that enables this to work. Specifically, we need to choose the input data, output data, and decide what defines good performance in Snake. To channel our synthesizer metaphor from above: we still need to make a keyboard, a speaker, and decide what kind of sound we want to hear.
A decent first approach for our input data is to provide the neural networks with the same information that we have. We play the game by looking at the screen: the colors of the pixels that make up the game’s environment. However, this would require the neural net to form connections that represent nearly every rule of Snake as described earlier. Learning about the walls, where the snake is, its direction, what food is and how to find it. The input data would need to be the color of every pixel of the game: hundreds or maybe even thousands of inputs. This is by no means impossible — but it is a lot more complicated than it needs to be.
Design from AI’s Perspective
Imagine playing Snake in the from a first-person point of view. Be the snake. Give the world some depth and picture yourself making left and right turns to avoid the giant walls of the world and the giant moving “walls” of your body and tail.

You really only need to know two things to play this version of Snake:
- Which direction is the food?
- Which direction can you move without dying?
As the neural net forms the connection between moving and dying, the trait needed to avoid both the environment’s walls and the snake’s own body are solved in a single step. Additionally, instead of describing to the neural net where it is and where the food is — we can describe the food as “straight ahead”, “to the right”, “to the left”, or “behind you”. This removes the need for the neural net to understand the size of the environment, where the various objects are within that environment, and even which direction it is heading. By designing from the point of view of our AI, we’ve significantly simplified the problem for our AI to solve.
Don’t Chase the Hype, Solve the Problem
Purists may argue that we’ve somehow cheated by designing out the hard parts — I strongly disagree. Unless your goal is to work towards some level of generalized AI (a solution that could be applied to problems other than just Snake), it’s just another ingredient technology and should be treated as such. Choosing carefully how to use any ingredient will lead to a faster and more understandable solution.
Could you solve Snake using rule-based methods? Of course. We are merely trying to see if given the right conditions, those rules can be developed by random chance.
So where does that leave our design? Resembling something like this:

Each piece of information given to and received from a neural network needs to be between zero and one. To satisfy this, we’ve broken up all of the input data into yes or no questions about each relative direction. It’s important to note that the neural net has no information about what these numbers mean or even that they are in two sets of three. To the AI, it’s just a list of 6 numbers.
Our output data is broken out in a similar way. We are asking our neural net to give us three numbers. We are simply going to pick the highest number and use that to move the snake in the direction described above. The neural net doesn’t start off knowing what these numbers are for — or how they will be used.
We are going to generate many versions of the neural net and allow each to have an attempt at moving the snake around. Those that perform best have made a connection between some aspect of the input and output data that keeps it alive. Over time we are going to continue to tweak those that got the highest scores — and eventually we’ll have an AI that plays Snake.
Defining Good
This brings us to our final design problem: what defines good performance in Snake? This is known as the reward function within the reinforcement learning domain.
A good first step is to replicate the game’s scoring mechanism. The longer the snake grows, the better. This works well enough, but takes a lot of time as it isn’t very obvious to the neural network that steps towards the food are good for it. We are trying to make this problem simple — so let’s start by giving 1 point for each step the snake takes toward the food and 10 points for each time the reaches the food.

Ba-da-bing ba-da-boom right?
Wrong.
As seen left, our definition of good created a loophole. By turning in circles, the AI is able to repeatedly gain points without dealing with the added risk of getting near walls or bumping into its growing tail. Reward functions are hard to get right and very easy to get wrong. In some scenarios, a human-in-the-loop is needed to say “this is the preferred result” instead of forcing an objective reward function on a subjective problem.
Luckily, this loophole is an easy fix. We can simply adjust our reward function to include a slight penalty for stepping away from the food. Let’s keep 1 point for moving towards the food and now subtract 1.5 points when moving away. Moving in circles will now create a negative score over time.
Evolution That Can Be Witnessed
Finally, we need to decide the size of each generation and how many neural nets will be chosen to breed the next. Since I want to make AI that is accessible and easy for anyone to witness, I need this to be able to run smoothly in just a web browser. This means picking a relatively small population size. Since a 5x11 grid fit well in my browser, I’ve chosen 55 — with the top 6 (about 10%) used to breed the next generation.
Each generation will last until all 55 have died or those that remain all have negative scores. This causes the first few generations to be very short, until the mutation of “turn if something is in front of me” occurs and is passed down to subsequent generations.
Since we simplified the problem and carefully thought through the design of our AI, it doesn’t take long for neural nets to evolve to meet our performance criteria. After a few dozen generations— improvement levels off significantly. Each generation still has some mutations that aren’t helpful, as a few snakes turn in circles generating lower and lower scores while the top performing snakes push generations to last longer and longer.

After 100 generations, we can see that the evolved strategy actually requires the neural nets to lose points at times. Instead of taking the fastest route, they move away from the food to reposition and safely get their bodies out of the way. This is surprising since the neural nets have no sense of the overall size of their environment and can only “see” one space ahead. How this trait evolved is puzzling and while every attempt at re-evolving will generate a different result — a variation of this behavior seems to always emerge eventually.
Try not to be distracted by the AI hype, shifting industry definitions, or bad marketing campaigns from big tech. Focus on the task at hand and diligently describe what you are expecting your AI to do.
AI can be a powerful solution for many problems — but it still needs a designer. Good designers think diligently through complex problems and walk a mile in their user’s shoes. Designing AI is no different.
Credits
Many thanks to Thomas Wagenaar, creator of Neataptic.js. This project relies heavily on his open source work and likely wouldn’t exist without it.
Want to try this in your browser?
Tweak, tune, and evolve your own Snake-playing AI — no software required.