Exciting world of Reinforcement Learning

https://ift.tt/axkBnCl Exciting World of Reinforcement Learning A case for consumer businesses Photo by Preethi Viswanathan on Unsplas...

https://ift.tt/axkBnCl

Exciting World of Reinforcement Learning

A case for consumer businesses

Photo by Preethi Viswanathan on Unsplash

Ever since I got curious and hooked on to the field of reinforcement learning and its numerous applications for the industry, my excitement for the field has only gotten stronger by the day. Here, I’d like to share some of my learnings about the potential applications of reinforcement learning (RL) for consumer businesses. But, before I dive into the details, a quick introduction about RL for ML practitioners who are new to the subject.

RL is a branch of Machine Learning involving the training of smart agent that can learn to perform a goal through trial & error in an environment and at the end of the training we have an agent that can perform the goal in real life independently. Now, if you are familiar with the other types of ML — supervised & unsupervised learning techniques—this might sound very similar to the supervised learning approach. But the big difference between the two (among other differences) is that RL does not require any explicit labels to be provided unlike in supervised learning techniques. For more details and context you may read several blogs/articles on RL. (There are several even on TDS/medium.) You could also look up some of the groundbreaking work done by Deepmind, OpenAI to learn more about the accomplishments over the years and also read the book ‘Reinforcement Learning — An Introduction’ by Richard S. Sutton & Andrew G. Barto to learn how the field of RL came about.

Among the many super exciting applications of RL, my search focussed on applications on personalisation use cases for consumer businesses. While my use case is focussed on the media & publishing industry, this could very easily be extended to other industries such as such as e-retailers, travel/hospitality, etc. Toward the end, we’d look at the broad contours of an RL solution that can help to accomplish these use cases.

a) Newsletter delivery personalisation — One of the primary sources of traffic for any media & publishing firm is through newsletters. We often experience that the newsletters from our favourite daily/weekly newspapers and magazines reach us at the same time regardless of the time we want to read it. In other words, it’s not uncommon for newsletters to get blasted out to all users at the same time/day of the week. Now, in the current era of digitisation, this need not be the case. The ideal solution would be to send it at the time when it’s highly likely to be opened by the user. RL could be used to send email at the most optimal time for each user driving personalised experiences for the readers.

An antique typewriter that sits on a wooden table — Photo by Markus Winkler on Unsplash

b) NL capacity identification — Secondly, another issue marketers grapple with often is to identify the optimum number of NLs to send to a subscriber. ‘How many is too many emails per user?’ It’s common knowledge that the NL appetite varies from user to user and it is not always the same. Yet, we are used to sending the same number of emails to all subscribers all the time. I admit that there is no easy way to dynamically determine this magic number for every user. But, with the application RL techniques, this is a problem that could be solved for.

c) Personalised box subscriptions — Box subscriptions are subscription products that are designed such that a subscription issue consists of a certain assortment of products. eg. a monthly subscription of a beauty box would contain a random assortment of beauty products such as products for face, skin, hair etc. The next month issue could be a totally different assortment of products. Please note that the subscriber has no choice in selecting the products he/she wants in this model and the only feedback that’s there from the user is the user renewing the subscription. The major challenge in this problem is in identifying the right product mix that would maximise the retention of our subscribers.

Formulating this problem as an RL problem, we could determine the optimal assortment for every subscription issue that is most personalised for a user while maximising the retention of a subscriber.

Box subsription-Photo by BATCH by Wisconsin Hemp Scientific on Unsplash

d) Dynamic paywall metering — In the digital media & publishing industry one of the key decisions that publishers need to make is about a tradeoff between making revenue through serving ads by allowing users to read articles for free and making revenue through subscriptions by blocking free access with a digital paywall (after a certain number of free articles), inducing the user to subscribe. The call to action from a paywall could either be to subscribe or to get the reader to register in order to read on further. Usually the paywall meter is set to — 2/4/6 free articles per month for all users.

But such an implementation is a not an optimal solution because a loyal reader of a brand, would continue to read more articles contributing through more ad revenue and cutting off this user’s readership to just 4 articles per month is cutting off potentially more ad revenue from this user prematurely. Ideally, we could bring up the paywall for such as user after may be 6–7 articles per month.On the other hand, a less engaged user who is unlikely to return to read a second article need not have a 4 article limit as this user is unlikely to generate any revenue through ads so we could set a paywall even at the second visit for such a user and push this user to subscribe.

Instead of setting up such manual rules for each user, RL could learn the reading pattern of each user and recommend an optimal paywall limit to maximise the revenue potential for each user. Not just that, it’s learning adapts itself to the changing reading behaviour of each and every user over time and automatically adjusts the paywall limit suitably to maximise the revenue potential for the business.

Now that we have looked at the use cases, let me give a sneak peek in to the RL solution design for one of the use cases. We’d solve this problem using an RL algorithm called DQN (Deep Q-Network) which is a combination of the principles of deep learning and Q-learning. I presume most ML practitioners would be familiar with deep learning. Q-learning is an algorithm of a class of RL solutions called tabular solutions which aims to learn the q-values for each state. (Q-value of a state is the cumulative (discounted) reward from all the states that the agent could go in the future). This is an elegant solution for problems that have a finite state spaces such as frozen lake problem. However for larger state spaces, this solution gets unwieldy and we’d need to adopt an approximate way of estimating state value and this class of solution is called ‘approximate methods’. DQN is the most popular algorithm among the approximate methods.

In DQN, the deep learning network serves as a function approximation that estimates the value for a given state/(action). The solution design, the algorithm and the setup would be the same for all the use cases, but the configuration of the MDP (Markov Decision Process)— the states spaces, rewards, actions to be taken would vary for each use case.

The MDP configuration for use case a) would be as follows.

States — The NL opens/clicks pattern for the last (1/2) months.

Action — 1–24 hours of the day. This could further be reduced to 12 action values with each action representing a 2 hour period when the email could be send.

Reward — +2 for a NL click, +1 for a NL open, 0 otherwise

I’m also sharing a link to a medium blog post (by Mehdi Ben Ayed and Patrick Halina from Zynga’s ML Engineering team) that explains how they solved the problem of customising app notifications. It was a very useful reference source and motivation for validating DQN solution/approach for these use cases.

I hope you found the above business use cases for the application RL to be insightful & useful. I’d like to follow up this article with a few use cases for the ad/commercial business and few learning resources that I found to be useful for building the RL capability within my team.

Exciting world of Reinforcement Learning was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Towards Data Science - Medium https://ift.tt/qnKgaC2
via RiYo Analytics

Page Nav

Pages

Breaking News:

Ads Place

Exciting world of Reinforcement Learning

https://ift.tt/axkBnCl Exciting World of Reinforcement Learning A case for consumer businesses Photo by Preethi Viswanathan on Unsplas...