Episode 1 — Genetic Algorithm for Reinforcement Learning

--

In this article, we describe how genetic algorithm can be used to solve reinforcement learning problem. We demonstrate this by solving the “Frozen-Lake” problem in OpenAI gym.

Genetic Algorithm

A genetic algorithm is an example of “evolutionary computation” algorithm which is a family of AI algorithms that are inspired by biological evolution. These methods are regarded as a meta-heuristic optimization method which means that they can be useful for find good solutions for optimization (maximization or minimization) problems, but they do not provide guarantees of finding the global optimal solution.

Requirements of genetic algorithm

Solving a problem by using genetic algorithm require representing its solution as a string of chromosomes (e.g. array of bits) and requires also having a fitness function that can be used to evaluate the solutions.

Steps of genetic algorithm

A genetic algorithm works by maintaining a pool of candidate solutions (named generation). Iteratively, the generation evolved to produce the next generation which has candidate solutions with higher fitness values than the previous generation. This process is repeated for a pre-specified number of generations or until a solution with goal fitness value is found.

Steps of Genetic Algorithm (Image Credit: The genetic algorithm explained)

The next generation is created from current generation in a biologically inspired manner that consists of 3 steps:

  • Selection: we evaluate the fitness of members of the current generation, then we select the subset with the best fitness values in order to act as parents for the next generation. In short, survival for the fittest.
  • Crossover: we merge pairs from the selected parents to generate new offspring child solution. Crossover can happen in different forms, simplest form is the one-point crossover which splits the string representation of each solutions into two parts at the same position, then conctatnate the first part of one solution with the second part of the second one to form the offspring solution representation.
Image credit : CreationWiki: http://creationwiki.org/Genetic_algorithm
  • Mutation: In biology, mutation happens with low probability where a child can have a feature that was not inherited from the parents. Likewise, in genetic algorithm mutation step perturbs the offspring solution with very small probability (e.g. flipping one bit of the bitstring representation of the solution)

Frozen Lake problem

Now, let’s demonstrate how genetic algorithm can be used to solve the Frozen-Lake problem from the OpenAI gym.

Problem description

Imagine you are standing on top of a frozen lake. where the map of the surface is described for you as a grid like the following:

SFFF       (S: starting point, safe)
FHFH (F: frozen surface, safe)
FFFH (H: hole, fall to your doom)
HFFG (G: goal, where the frisbee is located)

Your initial position is marked as S and you want to reach the position marked by G. However, if you step into any position marked as H you will fall into the water and there is no return back. Moreover, the surface of the lake is slippery so your actions of movement may not get executed as you want. For example, when you try to move forward, your actual move may be to the right or the left.

At each step, you get a reward of 0 except when you reach the goal you will get a reward = 1.

Remember from our previous article, you can model reinforcement learning problems as Markov Decision Process . In this problem, both the set of actions and states are discrete sets. The set of states S includes the possible locations in the grid (we can number them by numbers from 1 to 16). The set of actions A includes possibles moves: forward, backward, left or right. We can denote them as 1, 2, 3, 4. The reward and next state models are stochastic p(r_t+1 | s_t, a_t), p(s_t+1 | s_t, a_t) given that the lake surface is slippery and same movement action can lead to different next state and reward values.

Solving this problem requires finding a policy (i.e. which direction to move at each state) that can reach the goal more than 78%of times.

Random Search

Now, lets see if how far can a random search for a good policy achieve for this problem.

In this problem, we have 16 states and 4 possible moves. As a result, there exist 4^16=4294967296 possible policies. Of course, it is not feasible to evaluate all of them, but we can generate a random set of solutions and select the best among them.

The script below generates a set of 2000 random solution and evaluates them. The best policy score we get is only around 0.33 which means if an agent applies it, then he will be able to reach the goal only 33% of times.

Genetic Algorithm

To solve this problem by using genetic algorithm, we encode each solution as an array of 16 values which each value can be either 0,1,2, 3 representing the four possible moves at the different 16 positions.

We generate an initial population of 100random solutions, and we iterate through 10 generations by doing selection , crossover , and mutation .

As a result, we can see how the that best score in in the initial population is 0.2 but after 20 generations we can find a solution that achieves more than 80%score.

Link of solution on OpenAI gym. https://gym.openai.com/evaluations/eval_4VyQBhXMRLmG9y9MQA5ePA

Next time, we will explain the methods of value-iteration, and policy iteration and demonstrate how they can solve another example of reinforcement learning problems.

--

--