A Quick Guide to Reinforcement Learning

What is reinforcement learning?

Seth Adler

Reinforcement Learning (RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. While neural networks are responsible for recent breakthroughs in problems like computer vision, machine translation and time series prediction – they can also combine with reinforcement learning algorithms to create something astounding like AlphaGo. To better understand Reinforcement Learning let’s look at an analogy:

Let’s say you were dropped off at an isolated island, what would you do?

Initially, you would panic and be unsure of what to do, where to get food from, how to live and so on. But after a while you will have to adapt, you must learn how to live on the island, adapt to the changing climates, learn what to eat and what not to eat. You’re following the hit and trail concept because you’re new to this surrounding and the only way to learn, is experience and then learn from your experience.

This is what reinforcement learning is. It is a learning method wherein an agent (you, stuck on an island) interacts with its environment (island) by producing actions and discovers errors or rewards. And once it gets trained it gets ready to predict the new data presented to it.

In the above analogy, there are certain ‘key’ words that you need to be aware of, in order to understand the working principle of Reinforcement Learning; they are as follows:

  • Environment: Physical world in which the agent operates
  • State: Current situation of the agent
  • Reward: Feedback from the environment
  • Policy: Method to map agent’s state to actions
  • Value: Future reward that an agent would receive by taking an action in a particular state

How does Reinforcement Learning Differ from Supervised/Unsupervised Learning?

Reinforcement learning is different than unsupervised learning in terms of goals. The goal in unsupervised learning is to find similarities and differences between data-points. In the reinforcement learning problem, though, the goal is to find a good behavior, an action or a label for each particular situation if you will, to maximize the long-term benefits that the agent receives.

In order to compare reinforcement learning with supervised learning, let's think about an agent learning to play chess. In the supervised setting, the designer has to provide the correct label on a subset of situations. Imagine how cumbersome it would be to "give" the correct action in many situations to the agent. In the reinforcement learning problem, however, the designer is only expected to provide a reward signal. In the case of chess, it really is trivial: +1 for winning the game, -1 for losing the game, and 0 otherwise. It would then be the agent's job to assign credit to actions that led to the agent winning the game or the actions that messed it up!

5 Practical Uses of Reinforcement Learning:

The principles of Reinforcement Learning has found its way in to the field of robotics, whereby robots can be programmed to perform certain tasks and to even get better each day. The most common use of such industrial robots is to make the manufacturing process of companies more efficient. Besides manufacturing, reinforcement learning can be adopted for 4 other sectors of business. These 5 uses of reinforcement learning is shown below:


In Fanuc, a robot uses deep reinforcement learning to pick a device from one box and putting it in a container. Whether it succeeds or fails, it memorizes the object and gains knowledge and train’s itself to do this job with great speed and precision.

Many warehousing facilities used by eCommerce sites and other supermarkets use these intelligent robots for sorting their millions of products every day and helping to deliver the right products to the right people. If you look at Tesla’s factory, it comprises of more than 160 robots that do major part of work on its cars to reduce the risk of any defect.

The German conglomerate Siemens has been using neural networks to monitor its steel plants and improve efficiencies for decades. The company claims that this practical experience has given it a leg up in developing AI for manufacturing and industrial applications. In addition, the company claims to have invested around $10 billion in US software companies (via acquisitions) over the past decade.


Listen to Tony talk about the Fourth Industrial Revolution, or Industry 4.0, and why digital transformation should be a part of your business strategy

Source: The AI Network Podcast


General Electric is the 31st largest company in the world by revenue and one of the largest and most diverse manufacturers on the planet, making everything from large industrial equipment to home appliances. It has over 500 factories around the world and has only begun transforming them into smart facilities.  The goal of GE’s ‘Brilliant Manufacturing Suite’ is to link design, engineering, manufacturing, supply chain, distribution and services into one globally scalable, intelligent system. It is powered by Predix, their industrial internet of things platform. In the manufacturing space, Predix can use sensors to automatically capture every step of the process and monitor each piece of complex equipment.

Inventory Management

A major issue in supply chain inventory management is the coordination of inventory policies adopted by different supply chain actors, such as suppliers, manufacturers, distributors, so as to smooth material flow and minimize costs while responsively meeting customer demand.

Reinforcement learning algorithms can be built to reduce transit time for stocking as well as retrieving products in the warehouse for optimizing space utilization and warehouse operations.

In June 2016, Walmart announced testing of proprietary drones in its massive warehouses to improve inventory management. According to Walmart, manually checking inventory can take about a month for employees, but the same task can be completed in 24 hours using sophisticated drones that fly through the warehouse, scan items, and check for misplaced items. Walmart CEO Doug McMillon stated: “The internet of things, drones, delivery robots, 3D-printing and self-driving cars will allow retailers to further automate and optimize supply chains too.”

Delivery Management

Reinforcement learning is used to solve the problem of Split Delivery Vehicle Routing. Q-learning is used to serve appropriate customers with just one vehicle.

UPS uses an AI-powered GPS tool called ORION (On-road Integrated Optimization and Navigation) to create the most efficient routes for its fleet. Customers, drivers and vehicles submit data to the machine, which then uses algorithms to create the most optimal routes. Instead of back-tracking or getting stuck in traffic, ORION helps drivers make their deliveries on time and in the most efficient manner. The routes can even be changed on the go depending on road conditions and other factors. Optimizing delivery routes has a huge impact on all areas of UPS’ business, from saving time and money to reducing emissions and wear and tear on its trucks. With ORION, UPS estimates it can reduce its delivery miles by 100 million. Those savings can add up, especially because UPS predicts that for every mile its drivers cut from their daily routes, the company saves $50 million a year.

Power Systems

Reinforcement Learning and optimization techniques are utilized to assess the security of the electric power systems and to enhance Microgrid performance. Adaptive learning methods are employed to develop control and protection schemes. Transmission technologies with High-Voltage Direct Current (HVDC) and Flexible Alternating Current Transmission System devices (FACTS) based on adaptive learning techniques can effectively help to reduce transmission losses and CO2 emissions.

Finance Sector

AI with the help of reinforcement learning can be used for evaluating trading strategies. It is turning out to be a robust tool for training systems to optimize financial objectives. It has immense applications in stock market trading where Q-Learning algorithm is able to learn an optimal trading strategy with one simple instruction; maximize the value of portfolio.


Anne Furlong, with the provider of finance controls and automation software company, BlackLine, presents this webinar on increasing efficiency

Source: AIIA.net: AI and Intelligent Automation


One of the prominent companies in the finance sector that has been known for using reinforcement learning to place trades is JP Morgan.



Butcher S. [December 3, 2018]. JPMorgan's new guide to machine learning in algorithmic trading


Ritter G. [November 14, 2018]. The Usefulness of Reinforcement Learning in Finance


Wolfe F - Harvard Blogs. [August 28, 2017]. How Artificial Intelligence Will Revolutionize the Energy Industry


Special Report by The Economist. [March 28, 2018]. How AI is spreading throughout the supply chain


Marr B. [June 15, 2018]. The Brilliant Ways UPS Uses Artificial Intelligence, Machine Learning And Big Data

Wollenhaupt G. [August 28, 2018]. Move over delivery drones, warehouse drones are ready for the spotlight

GE Digital – Industrial Application. [NA]. Predix Manufacturing Execution Systems

Theocharous G, Mahadevan S. [NA]. Optimizing Production Manufacturing using Reinforcement Learning

Rivlin O. [May 03, 2019]. Reinforcement Learning for Real-World Robotics

Barto A, Sutton R. [November 5, 2017]. Reinforcement Learning: An Introduction. 2nd Ed, The MIT Press