REINFORCEMENT LEARNING – Another able branch of Artificial Intelligence

‘Reinforcement Learning’ is a field in artificial intelligence(s) machine learning. Inspired by behaviorist psychology, this field renders software agents and machines to ascertain behavior, take actions accordingly, ultimately maximizing their performance. Put simply, computers can reckon/learn themselves by experimenting along with responses from the environment on how things must be done and keep adapting while getting better each time leading to maximization. For eg., controlling computers are trained to play games, schedule jobs such as elevator scheduling (elevator secontrol limbs.

Reinforcement Learning (RL)

RL was documented more than 100 years ago by psychologist Edward Thorndike. This technology, rather than letting the programmer telling it what to do, lets the computer/software agent performs tasks on its own by slowly figuring out the best way. The interaction lies between two elements – environment and the learning agent. On the way, the agent is rewarded by the environment, known as the reinforcement signal. On the reward basis, the agent uses the knowledge and makes choices for the next action. In essence, computers learn to like people without the need of explicit training. Punishments to happen along the way for the artificial agent, but with constant trial and error methods, agents learn and arrive at the best method (based on raw inputs).

Continuously, selections are made while segregating the good and the bad. Representation is done by a Q-network that guesstimates the total reward. The technology is now becoming more powerful with the combination of deep learning which uses a large simulated neural network to identify patterns/trends in data and performs the learning tasks of the computer.

One of the best examples is of AlphaGo, a subsidiary of Alphabet, that developed a computer called DeepMind, which went on to beat the best human player in the world in the board game Go in 2016. This makes the world sit up and recognize RL’s significance as it was practically impossible to code the extremely complex game Go. Similarly, for large and complex tasks, computation becomes unworkable. From self-improving cars which tend to perform RL with safety and precision, this technology can also be used for robots (without using manual programming) and can figure out the configuration required for the apparatus in a data center. Other players in RL are Mobileye, OpenAI, Google, and Uber. Google and DeepMind also worked together to make its center’s energy efficient. This was made possible through an RL algorithm which can study from assembled data, experiment through stimulation and finally suggest when and how the cooling systems must be operated.

Steps of ’cause and effect’ for an RL Agent

  • The artificial agent detects the input status (RL first identifies and formulates the problem).
  • The next step is determined by the strategy to be taken.
  • The action is then performed and a reward/punishment and accordingly reinforcement are provided.
  • The informed status is recorded.
  • Finally, the best action can further be adjusted to enhance results.

Unsupervised, Exploitation and Exploration of RL Systems

RL is a form of unsupervised learning where the agent is left to learn in the environment provided and learns by gradually adjusting. Further to this, the RL agent tries to learn through the process of exploitation and exploration. Exploitation implies that once the agent has achieved a satisfactory result and rewarded, it can exploit the same technique again to achieve results. Exploration implies that an RL agent might try different strategies which could result in better rewards and recognition, hence exploring the situations. The two strategies must work collectively.


There are limitations to RL too. The expense of memory being able to store values could be complex as the problem in itself is complex. Moreover, similar behaviors occur too often, while modularity has to be introduced to prevent repetition. There is also the limiting factor of perception (Perceptual Aliasing) ultimately affecting the functioning of the algorithm.

The Business Benefits

RL is ultimately machine learning algorithms which maximize performance. It can be widely used in:

Manufacturing – Robots use RL while picking goods and place them in the right position – once done correctly, they continue the method with precision;

Inventory management – Space utilization is imperative for e-commerce and retailers – RL allows for algorithms which can decrease time for stocking and retrieving products enhancing warehouse operations;

Finance – RL aids in evaluating strategies in trading and optimizing financial goals;

Delivery management – RL solves issues in Split Delivery Vehicle Routing – Q-learning manages by providing one vehicle for apposite customers;

Dynamic pricing – RL promotes strategies/optimization of dynamic pricing through demand, supply and interaction with customers;

E-commerce personalization – RL assists in analyzing consumer behaviors and tailors products and services as per the interests;

Medical industry – RL algorithms address the dynamic treatment regime – DTR problem, and processes clinical data to decide on a treatment strategy on the basis of patient’s inputs.

RL is indeed innovative and goal-oriented with an emphasis on learning from interactions with the environment that can steer business value. It is possibly the buoyancy of realistic artificial intelligence.

karthikeya Boyini
karthikeya Boyini

I love programming (: That's all I know