minimax

Tejas
Becoming Human: Artificial Intelligence Magazine
6 min readOct 20, 2021

--

Enter Minimax

Minimax is a backtracking algorithm used in Game Theory and Artificial Intelligence to minimise the worst case potential loss. In better words, it is used to find an optimal strategy for two adversaries in a perfect information scenario. This algorithm is commonly used in two-player combinatorial games such as Chess and Go. The player employing this strategy considers the opponent’s every move and assigns a payoff to the state of the game succeeding the opponent’s move. He then picks the next optimal move by minimising the loss involved assuming the opponent selects the move with the least possible payoff (relative to the player).

Say we want to write a bot that can play the game, Tic-Tac-Toe. We need to work on three main functions,

  • Search Function
  • Heuristic Function (static evaluation on terminal nodes)
  • Minimax Function (recursive)
Think of Minimax as a function that takes in the state of the game and spits out a optimal move

Search Function

The search function looks at all future possible states of a game from the status quo, before deciding on picking the optimal move. There will be two players, the Maximiser (the bot) and the Minimiser (the human). This data is stored in a Game Tree, a data structure containing the possible game moves n time steps into the future.

Suppose the current state of a game is as shown below in the top most green circle.

A node is considered terminal when

  1. There is a winner, or
  2. There are no more moves to make, i.e Draw

Terminal nodes do not have children because you cannot make any more moves from a terminal (win, lose, draw) state.

Big Data Jobs

Heuristic Function (Static Evaluation)

Even for a simple game like Tic-Tac-Toe, there are a total of 3⁹ =19683 states of the game. It would be computationally inefficient to explore some of the branches knowing that some of the states would be illegal positions. To be able to cut off the search function at any target depth, a heuristic static evaluation function must be used to assign values to a certain node. This function will only be run on terminal nodes/when target depth is reached.

For this game, the heuristic function could assign a +1 score when the Maximiser wins, or -1 when the Minimiser wins or 0 when it's a draw.

Assigning scores to non-terminal nodes is not possible as we do not know who is winning in the context of this game. If it was chess, a heuristic function could be applied to a non terminal node, by subtracting the number of opponent pieces from the player’s, to get a general sense of who’s winning.

Notice that wins further down the tree have the same scores as those nearer to the root node.

We want to penalise wins further down the tree, as it is optimal to reach the win state earlier. We can do this by upscaling the rewards, say 10, and subtract the depth from the score of the evaluated node.

Trending AI Articles:

1. Why Corporate AI projects fail?

2. How AI Will Power the Next Wave of Healthcare Innovation?

3. Machine Learning by Using Regression Model

4. Top Data Science Platforms in 2021 Other than Kaggle

This is because since evaluations of the game state are from the perspective of the algorithm. Longer wins should be penalised, longer losses should be rewarded and vice versa.

Minimax Function

The pseudocode for the minimax function

The algorithm traverses from the terminal nodes to the root node, alternating between the minimiser’s and maximiser’s turn. The parent node (maximiser) picks the score that maximises its chances of winning → Highest evaluation. The scores are then passed up, as indicated by the blue arrows, to parent nodes who previously didn’t have a score.

Similarly, when it’s the minimiser’s turn, the parent node picks the score that minimises the chance of the bot winning → lowest score and this score is passed up.

Once the scores are passed up to the root node, the maximiser can go ahead and pick the move that leads to the the highest score, assuming that the opponent plays optimally.

Alpha — Beta Pruning

Pruning is a optimisation method implemented to speed up the search function, which i talked out about earlier. This is done by cutting off branches that are irrelevant to the outcome of the game.

Let’s say we have gotten to this stage of the tree, and the next step, without optimisation, would be to evaluate the last end state. We can optimise the run time of this algorithm by doing this:

Since the minimiser node will pick the left branch no matter what score the other branch spits out, it does not make computational sense to evaluate the score of the end state. Hence, that branch is pruned.

This may not seem like a lot, but the more complex a game tree gets, the more pruning opportunities there are.

To implement pruning in code, we can pass two additional parameters to the minimax function, alpha & beta.

and these two lines of code when calling minimax on each child of a node.

A whopping 97% decrease in amount of static evaluations performed.

A fatal flaw

The opponent is assumed to make the most optimal move every turn, and this may not hold especially when playing against sub-par opponents that make silly moves. When this happens, the algorithm’s assumption deviates from the actual opponent’s behaviour. However, it still leads to the desirable outcome of never losing. This would not be true for a MaxiMax algorithm, but that’s out of the scope of this article.

Checkout the full code for this article here:

GitHub — g-tejas/MiniMax: Algorithms from Scratch: MiniMax

In the likely event that I have not explained this well, this video by Sebastian Lague does an excellent job of explaining this topic in great detail.

Don’t forget to give us your 👏 !

--

--