Reinforcement Learning (RL) refers to a kind of unsupervised machine learning method that is concerned with how agents should take actions in an environment. Unlike supervised learning, which is based on given sample data or examples, the RL method is based on interacting with the environment. The agent receives a delayed reward in the next time step to evaluate its previous action. The problem to be solved in reinforcement learning (RL) is defined as a Markov Decision Process, i.e., all about sequentially making decisions. So, it contains decisions that an agent must make.
A short glossary of Reinforcement Learning
Here are some fundamentals terms used in Reinforcement AI:
- Agent: is anything (i.e. usually a piece of software) is anything which perceives its environment, takes actions autonomously in order to achieve goals and receives a reward based on these actions.
- Environment: It is the demonstration of the problem to be solved. There may be a real-world environment or a simulated environment with which our agent will interact.
- Reward: it is the numerical value that the agent receives by performing a specific action, or a task at some state(s) in the environment. The numerical value can be positive or negative.
- State: it is a representation of the current world or environment of the task. Agents can perform actions to change these states. So the states indicates in what situation the agent is currently
- Policy: the strategy that the agent employs to determine next action based on the current state
How Reinforcement Learning works
Reinforcement learning is like training your dog to do tricks: you provide goodies as a reward if your pet performs the trick you desire, otherwise, you punish him by not treating him, or by providing lemons.
Of course, RL is more complex than this, but, basically, it deals with learning via interaction and feedback, or in other words learning to solve a task by trial and error, or in other-other words acting in an environment and receiving rewards for it. Essentially an agent (or several) is built so that it can perceive and interpret the environment in which it is placed, and then take actions and interact with it.
The environment starts by sending a state to the agent, which then based on its knowledge, makes a decision on what action to perform in response to that state. After that, the environment “sends” a pair of next state and reward back to the agent. The agent will update its knowledge with the reward returned by the environment to evaluate its last action. The loop keeps going on until the environment sends a terminal state, which ends the episode.
The main types of RL
RL can be split roughly into model-based and model-free techniques.
Model-based RL is the process of inferring optimal behavior using a predictive model of the environment to ask questions of the form “what will happen if I do x?” to choose the best x, by performing actions and observing the results, which include the next state and the immediate reward. The policy network, which is required for model-based RL, but not for model-free, is the key difference between model-free and model-based learning. A policy comprises the suggested actions that the agent should take for every possible state. Model-based learning attempts to model the environment then chooses the optimal policy based on the learned model. Anyway, model-based algorithms become impractical as the state space and action space grows.
On the other hand, a model-free approach, the agent relies on trial-and-error experience for updating its knowledge. As a result, it does not require space to store all the combinations of states and actions, in contrast to the previous approach. So, the first approach, the “on-policy” one involves learning from current state and actions, whereas the “off-policy” involves learning from random states and actions. In mobile advertising, the model-free approach is the category typically used, since there is a huge amount of data to be managed and you can not rely on old data to predict insights.
Reinforcement Learning in Mobile Advertising
Managing and understanding the growing volume of data, produced by various media sources, for marketing goals is impossible using older technologies. Here is where machine learning becomes useful. Reinforcement learning, which is, as stated earlier, a type of unsupervised machine learning, is also used by advertisers, for the so called “machine learning based advertising”.
There are several reasons why ML is fundamental in mobile advertising app-install campaigns. First, since the payment model is based upon conversions (i.e. app install or post-install actions), brands only pay for impressions delivered to the right people, while the mobile DSP pays for every impression. Second, it made possible the ability to measure an ad’s effectiveness with real-time reporting, which has been reduced by changes in the attribution model. In fact, since privacy is a key feature of the changes introduced by Apple updates for iOS, precise targeting has become difficult. As advertising gets more complex, you need to rely on the analytical and on real-time optimisation capacities that an algorithm can provide.
How Programmatic Advertising can be improved by Reinforcement Learning
As stated above, mobile DSPs, with their vast scale of data, are able to leverage machine learning in user acquisition campaigns for app marketing. For marketers, it’s important to understand how their media partners use machine learning in an effort to optimize your campaigns. DSPs and exchanges, and other sell-side players use machine learning in programmatic campaigns, particularly ones that involve real-time bidding.
RL is useful in app-install campaigns, for which programmatic advertising has become crucial. Programmatic advertising is the automated process of buying and selling ad inventory through an exchange. Programmatic advertising leverages algorithms in order to improve:
- cost efficiency, thus making better budget decisions and increase the ROI;
- campaign value, in terms of quality of traffic and leads
So, a machine learning algorithm can be used for programmatic campaigns to identify patterns and signals that allow for real-time, actionable decision making, and performance optimizations. The ML model learns and gets better with time as to where and when to run your ads. It is highly targeted, multi-channeled, and uses A.I.; therefore, using programmatic buying allows a massive ROI.
App-install campaigns present even more difficulties than other kinds of mobile advertising campaigns. This is due to the attribution technology that is used to track conversions. Attribution models rely heavily on technologies and the ability to track conversions differs from one platform to the other (or from one connection provider to the other for example). As many of you are probably aware this is due to probabilistic models. In this context we are not even referring to SKAdNetwork attribution that introduces even bigger difficulties because of delayed postbacks, and other data limitations introduced by Apple to improve the privacy of final users. All these problems can be solved only using machine learning techniques, such as reinforcement learning.
User Segmentation with Clustering
For the various difficulties that occur in conversion tracking, less granular and less accurate data are provided, hence the use of precise predictive modeling techniques that allow user segmentation/cohorts reporting is not a luxury, but it is a must. The way this is done is through clustering, which can be construed as a “broad pattern-seeking” approach, since instead of predicting the correct output, models are tasked with finding patterns, similarities and deviations, that can be then applied to other data that exhibit similar behaviour. which infers labels by forming groups
So, clustering algorithm is a technique that infers labels by forming groups, that assists user segmentation by classifying similar customers into the same segment. Clustering algorithms are helpful in understanding customers, in terms of both static demographics and dynamic behaviors. Within the context of mobile advertising, customers with comparable features often interact with ads similarly, thus business can benefit from this technique by creating tailored marketing strategy for each segment. If you apply clustering to mobile advertising, you can achieve a high precision profiling of users that led to a high effectiveness of the advertising campaigns, thanks to the more specific targeting and the recognition of possible unknown classes of users.
Conclusions
Most marketers uncover challenges in determining the right content that can help in fulfilling their advertising goals. But, by implementing reinforcement learning, which is rewards-based and connects positive actions to wanted results, picking the most suitable content for advertising campaigns can be optimized. Reinforcement learning reduces the number of points non-optimal content is presented, thereby maximizing profits. After implementing reinforcement learning, the algorithm could recommend more suitable keywords, videos, photos, and other content from an extensive online marketing library, enabling advertisers to deliver the most suitable choices for practicing targeting.
Here at Mapendo, Jenga, our proprietary AI technology, based on reinforcement learning, collects tens of thousands of data related to a given topic, finds patterns, and it manages to predict the possible outcome of a marketing campaign and finds the audience that is most likely to convert for a type of ad. Our algorithm has been trained to optimize the traffic according to the client’s KPIs, maximize user retention and generate post install actions. Advertisers need to leverage technology to find meaningful insights, predict outcomes and maximise the efficiency of their investment, by choosing the right channels and budget.