Reinforcement Learning (RL) is one of the most exciting fields of machine learning today. and also one the oldest. It has been around since the 1950s, producing many exciting applications over the years, particularly in games (e.g., TD-Gammon, a Backgammon-playing program) and in machine control, but seldom making the headline news.
But a revolution took place in 2013, when researchers from a British startup called DeepMind demonstrated a system that could learn to play just about any Atari game from scratch, eventually outperforming humans in most of them, using only raw pixels as inputs and without any prior knowledge of the rules of the games.
This was the first of a series of amazing feats, culminating in march 2016 with the victory of their system AlphaGo against Leesedol, a legendary professional player of Go’s game, and in May 2017 against Ke Jie, the world champion. No program had ever come close to beating a master of this game, let alone the world champion. Today the whole field of RL is boiling with new ideas, with a wide range of applications. Google bought Deepmind for over $500 million in 2014.
So how did DeepMind achieve all this?
In hindsight, it seems rather simple: they applied the power of Deep Learning to the field of Reinforcement Learning, and it worked beyond their wildest dreams. In this article, I will explain what Reinforcement Learning is and what it’s good at.
How Reinforcement Learning Works?
In Reinforcement Learning, a software agent makes observations and actions within an environment, and in return, it receives rewards. Its objective is to learn to act in a way that will maximize its expected rewards over time. If you don’t mind a bit of anthropomorphism. You can think of positive rewards as pleasure, and negative rewards as pain (the term ” reward” is a bit misleading in this case). In short, the agent acts in the environment and learns by trial and error to maximize its pleasure and minimize its pain.
This is quite a broad setting, which can apply to a wide variety of tasks. Here are a few examples:
- The agent can be the program controlling a robot. In this case, the environment is the real world. The agent observes the environment through a set of sensors, such as cameras and touch sensors. Its actions consist of sending signals to activate motors. It may be programmed to get positive rewards whenever it approaches the target destination and negative rewards whenever it wastes time or goes in the wrong direction.
- The agent can be the program controlling Ms.Pac-Man. In this case, the environment is a simulation of the Atari game. The actions are the nine possible joystick positions (upper left, down, center, and so on). The observations are screenshots, and the rewards are just the game points.
- Similarly, the agent can be the program playing a board game such as Go.
- The agents do not have to control a physically ( or virtually) moving thing. For example, it can be a smart thermostat, getting positive rewards whenever it is close to the target temperature and saves energy. Negative rewards when humans need to tweak the temperature, so the agent must learn to anticipate human needs.
- The agent can observe stock market prices and decide how much to buy or sell every second. Rewards are the monetary gains and losses.
I hope you liked this article on Reinforcement Learning. Feel free to ask questions on this topic or any topic you want in the comments section below.