What is Q-learning with respect to reinforcement learning in Machine Learning?

Q-learning is a type of reinforcement learning algorithm that contains an ‘agent’ that takes actions required to reach the optimal solution.

Reinforcement learning is a part of the ‘semi-supervised’ machine learning algorithms. When an input dataset is provided to a reinforcement learning algorithm, it learns from such a dataset, otherwise it learns from its experiences and surroundings.

When the ‘reinforcement agent’ performs an action, it is awarded or punished (awards and punishments are different, as they depend on the data available in hand) based on whether it predicted correctly (or took the right path or took a path that was least expensive).

If the ‘reinforcement agent’ gets an award, it moves in the same direction or on similar lines. Otherwise, if the agent is punished, it comes to the understanding that the solution it gave out was not correct or optimal, and that it needs to find better paths or outputs.

The reinforcement agent interacts with its surroundings, takes actions on certain issues thereby ensuring that the total amount of rewards/awards is maximized.

To understand this better, let us take the example of a game of chess. The idea is that every player in the game takes an action so as to win (perform a checkmate, take off all the pawns of the opponent player, and so on). The ‘agent’ would move the chess pawns, and change the state of the pawn. We can visualize the chess board as a graph that has vertices and the ‘agent’ moves from one edge to another.

Q-learning uses Q-table that helps the agent to understand and decide upon the next move that it should take. Q-table consists of rows and columns, where every row corresponds to every chess board configuration and columns correspond to all the possible moves (actions) that the agent could take. The Q-table also contains a value known as Q-value that contains the expected reward which the agent receives when they take an action and move from current state to next state.

How it works?

Let us understand how it works.

In the beginning of the game, the Q-table is initialized with a random value.

Next, for every episode −

  • The initial state of the agent is observed
  • For every step in the episode,
    • A new action is selected based on a policy present in the Q-table
    • The reward received by the agent is observed, and the agent moves to a new state
    • The Q-value present in the Q-table is updated using ‘Bellman equation’

This goes on till the end stage for a particular episode is reached.

Note − One episode can be understood as an entire game of chess, in our example. Else, it is just one entire working of a problem in hand.

Updated on: 10-Dec-2020

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started