Reinforcement learning: A gentle introduction and industrial application
Machine learning can be implemented in different ways, one of which is reinforcement learning. What exactly is reinforcement learning and how can we put it to use? Before the upcoming ML Conference, we spoke to Dr. Christian Hidber about the underlying ideas and challenges of reinforcement learning, and why it can be suited for application in an industrial setting.
JAXenter: For those who are not familiar with this term, what is the basic idea behind reinforcement learning?
Christian Hidber: With reinforcement learning, computers learn complex behaviors through clever trial-and-error strategies. This is very much like a child learning a new game: They start by pressing some random buttons and see what happens. After a while, they continuously improve their gaming strategy and get better and better. Moreover, you don’t have to explain to a child how the game works, as it’s part of the fun to figure it out. Reinforcement learning algorithms essentially try to learn by mimicking this behavior.
JAXenter: Reinforcement learning does not require large data sets for training. By which means is this accomplished?
Christian Hidber: These algorithms learn through the interaction with an environment. In the game example above, the game engine containing all the rules of the game is the environment. The algorithms observe which game sequences yield good results and try to learn from them. In a sense, reinforcement learning generates its dataset on the fly from the environment, creating as much training data as needed – pretty neat!
JAXenter: How well does the accuracy of reinforcement learning solutions fare compared to other types of machine learning?
Christian Hidber: Reinforcement learning addresses machine learning problems that are hard to solve for other types of machine learning and vice versa. Thus, you rarely find yourself in a situation where you can compare their accuracies directly. The accuracy in reinforcement learning may vary a lot for the same problem, depending on your model, data and algorithm choices. So that’s quite similar to classic machine learning approaches.
Reinforcement learning is very much like a child learning a new game.
JAXenter: In your talk, you give an insight into how you applied reinforcement learning to the area of siphonic roof drainage systems. For which reasons did you choose it over other machine learning methods?
Christian Hidber: Actually, we use reinforcement learning in a complementary fashion. Our calculation pipeline uses traditional heuristics as well as supervised methods like neural networks and support vector machines. At a certain point, we had to realize and could prove that we were not able to improve our classic machine learning solution any further. Using reinforcement learning as an additional stage in our pipeline, we were able to reduce our previous failure rate by more than 70%.
JAXenter: In which areas might reinforcement learning play a central role in the future?
Christian Hidber: There are already quite a few real-world applications out there in production, like cooling a data center or controlling robot movements. Personally, I think that reinforcement learning is particularly great for industrial control problems. In these cases, we can often simulate the environment, but there’s no clear-cut way on how to find a good solution. That was also the setup in our hydraulic optimization problem. So, I expect to see many more industrial applications.
JAXenter: Can you think of any typical mistakes that may happen when starting to work with reinforcement learning?
Christian Hidber: Oh, yes, absolutely, since we made a lot of mistakes ourselves. Some of them resulted in very funny and surprising strategies. A large temptation is always to put a lot of cleverness into the reward function. The reward function is responsible for defining which outcome is considered “good” and which “bad”. The algorithms are incredibly smart at finding short-cuts and loopholes, producing high rewards for behaviors which are definitely “bad”. It seems that the more cleverness you put into the reward function, the more surprises you get out of it.
JAXenter: What do you expect to be the main takeaway for attendees of your talk?
Christian Hidber: My goal is to give the attendees a good intuition on how these algorithms work. The attendees may then decide on their own whether a problem at hand might be suitable for reinforcement learning or not. And of course, if an attendee already has an idea for an application, I would be more than delighted to hear about it.
JAXenter: Thank you very much!