Artificial Intuition and Reinforcement Learning, The Next Steps In Machine Learning

--

Humans and machines differ in how they act in so many ways. Yet there have been developments in AI that have led to not only more intelligent machines, but also it seems like they have developed a form of intuition. It was an insight learned from Google’s DeepMind research that involves a super computer that used AI called AlphaGo. It became a master in playing the ancient game of Go, even defeating the best human players in the world. Then a successor was built called AlphaGo Zero which defeated AlphaGo. It seems it had developed its own strategy based on what appears to be intuitive thinking. That was something believed only humans can have, not computers.

The Alpha Go server rack (Source: Google)

Intuition has a lot more to do with gut feeling, rather than calculated decision making processes. Being intuitive is not the same as being intellectual. They are really two different cognitive processes. Intelligence is based on what is known while intuition deals with the unknown. Intuition is more based on feelings, while intelligence is logic. Humans can make a decision based on what they feel, not necessarily what would be logical. Computers don’t have emotions like humans, so for a machine to use a “hunch” when making decisions is quite remarkable indeed since they are binary. Yet that is how to win a complex game like Go, to defy logic and base decisions on the possibility of an outcome. For example, an opponent can easily make a mistake that won’t be noticeable to a machine, unless it was trying to look at it from a human perspective. Understanding what the opponent is trying to do based on things beyond the rules of the game is intuitive thinking. So this would mean AlphaGo Zero has some form of function that might be analogous to human intuition, yet it is a machine.

Trending AI Articles:

1. Deep Learning Book Notes, Chapter 1

2. Deep Learning Book Notes, Chapter 2

3. Machines Demonstrate Self-Awareness

4. AI & NLP Workshop

AlphaGo Zero, the successor to AlphaGo, has beaten its predecessor in its own game. AlphaGo is renowned for beating the world’s top players in the game called Go, an ancient Chinese board game that requires intuitive thinking as part of its strategy. Until recently, computers could not make their own decisions based on intuition. Then Google’s DeepMind developed AlphaGo to play the game of Go and eventually develop its own strategies. It actually worked so well, even top Go players learned new things. The only thing that could beat AlphaGo was a newer version of itself called AlphaGo Zero. AlphaGo Zero beating AlphaGo shows that the field of deep learning in AI has made a major advancement. Deep Learning is a subset of Machine Learning, and under this falls another classification called RL Reinforcement Learning or “Self Learning”. This uses ANN (Advanced Neural Networks) to use data to make decisions.

After only three days of self-play, AlphaGo Zero was strong enough to defeat the version of itself that beat 18-time world champion Lee Se-dol — 100 games to nil. After 40 days, it had a 90 percent win rate against the most advanced version of the original AlphaGo software. DeepMind, the creators of AlphaGo and AlphaGo Zero, says this makes it arguably the best Go player in history, and it is non-human.

Humans have a gut feeling that something is going to happen. That is intuition. When they are certain of something that is knowledge based on intelligence.

This was an example of “Self Play Reinforcement Learning” which AlphaGo Zero utilized. This allowed the computer to train itself from scratch and actually become better than its predecessor in the smallest timeframe. What AlphaGo Zero did was play Go millions of times with itself without human intervention, meaning it was unsupervised ML. Basically the neural network in the program for AlphaGo Zero is creating its own “artificial knowledge”. AlphaGo Zero learned from reinforcement based on a sequence of actions that had both consequences and inception.

Basic RL is based on the Markov Decision Process. For every move AlphaGo or AlphaGo Zero makes, it looks at the probability of outcomes with the aid of powerful processors called TPUs arranged in an asynchronous distributed mode. To explain, asynchronous means that it does not rely on previous tasks to complete for execution. Asynchronous tasks process in parallel. This is important to machine learning because of how AlphaGo Zero responds to new data input without being programmed to do so. AlphaGo Zero did not require any human intervention in the sense that it learned to play Go by playing against itself until it could anticipate its own moves and how those moves would affect the game’s outcome.

What AlphaGo Zero accomplished were:

  1. Defeating the original AlphaGo 100 games to nil.
  2. Teaching itself to play Go without human knowledge.
  3. Achieved world class Go proficiency in 3 days.
  4. Accomplished with less hardware resources.
  5. Required less training (4.9 millions games vs. 30 million)

AlphaGo Zero accomplished all this with a simpler architecture that used only 4 TPU’s compared to the original which used over 48 TPU’s. This type of deep learning that leads to self training and mastery of a specific task can lead to other applications besides just playing a game like Go.

The process AlphaGo Zero went through includes the following:

  1. Self Play — Creating a training set
  2. Retrain — Optimize the network weights
  3. Evaluation — Compare winning and losing results
  4. Game State — Create stacks as input to the NN
  5. Deep NN — All moves are learned by AlphaGo Zero without requiring human intervention
  6. MCTS (Monte Carlo Tree Search) — Technique used by AlphaGo Zero to determine it’s next move

Typically in supervised learning methods:

Y = f(X)

You have an input variable (X) to the output variable (Y) which is a mapping function that uses labeled training data.

In unsupervised or “Self Learning” methods:

Association rules can state relationships in the form:

If a person purchases item X, then he purchases item Y’ as : X -> Y

For example a person who likes items milk and sugar may likely also purchase coffee powder, though it is not always true. However it could be the outcome of many encounters, so we accept that as the result.

{milk,sugar} -> coffee powder

AlphaGo Zero uses these associations it has learned from playing against itself to create intuitive decision making.

The AlphaGo Zero Cheat Sheet (Source: applied-data.science/static/main/res/alpha_go_zero_cheat_sheet.png)

According to Dr. David Silver, Lead Programmer of AlphaGo Zero: “By not using human data — by not using human expertise in any fashion — we’ve actually removed the constraints of human knowledge. It’s therefore able to create knowledge itself from first principles; from a blank slate […] This enables it to be much more powerful than previous versions.”

On December 5, 2017, DeepMind released AlphaZero, the successor to AlphaGo Zero. After 34 hours of learning, using only 4 TPU’s on a single machine, it defeated AlphaGo Zero in the game of Go. While games like Go require strategy, it is intuition that makes it a unique game. Somehow, AlphaGo Zero and AlphaZero have developed their own machine intuition that allowed them to play the game much better than it’s predecessors.

Perhaps the lesson learned from AlphaGo, AlphaGo Zero and AlphaZero is how AI can help advance humans to learn better. The original AlphaGo revealed many weaknesses in humans that the Go masters were all saying that it changed the way they looked at the game. If this encounter never happened no new insights about the game of Go would have ever happened. Many insights can be made that will help improve how we do things. Who knows maybe there will be a better solution to making the perfect pizza.

One reason we want machines to be able to think intuitively can be for our safety. A great example of applying artificial intuition would be for self-driving cars or autonomous vehicles. The use of onboard sensors with AI software has been how these systems function, yet they are still prone to accidents on certain occasions. With a form of artificial intuition self-driving cars can act in ways that can anticipate the unpredictable things that can happen on the road. For example during rainy weather, self-driving cars are programmed to slow down and turn on its wipers by its sensors. With artificial intuition it can be trained to also anticipate the dangers that await it by doing things human drivers would sometimes do e.g. pull to the side of the road if the rain gets worse or take a safer route to destination. With reinforcement learning combined with it, that can be a more stable system.

Self-driving cars with Artificial Intuition can make split-second decisions could increase public safety, but will require plenty of testing and regulations.

In the context of artificial intuition and reinforcement learning, it does not mean that suddenly machines will have a mind of their own. Instead, what these techniques in machine learning can do is train systems to gather more insights that go beyond the basics. What many training models begin with are really just basic ways to train a system. Now as a system becomes more advanced, these techniques feed more information to it from which it builds on insights. In this way the system is acquiring knowledge in a way where it doesn’t require proof or conscious reasoning. A machine can merely anticipate what is to happen and act unconsciously on it like how humans make critical decisions. We learn and integrate it into our system, like second nature. The same concept can be used to train machines in a way that they can act when facing uncertainty.

--

--

Blockchain, AI, DevOps, Cybersecurity, Software Development, Engineering, Photography, Technology