Our product
February 24, 2021

How to Use a Neural Network to Determine the Best Sending Time for an Email and Increase Mailing Revenue by 8.5 times

To ensure that mailings aren't overlooked and increase the likelihood that customers will open them and buy products, it is important to send them at the right time, known as Best Sending Time. Using a neural network, we predicted the Best Sending Time for an email that will recommend products at a time when the customer would most likely want to buy them. Specifically, we tested this in a pet store mailings with a repurchase offer and evaluated the result using A/B tests. Our results:

Increased by 23times
Targeted emails sent with a neural network (compared to triggers)
Increased by 8,5times
Revenue from messages via an email channel (last click attribution)
Decreased by  2times
Unsubscribed customers
Increased by  17times
Opened emails’ absolute value

This article will be useful for developers who want to discover how to use neural networks and understand how the LSTM model works. It is also for managers and marketers who want to increase email channel revenue. Below we share our experience with you:

  • Why we decided to use the LSTM neural network model to predict the best date for sending an email instead of using a gradient boosting algorithm;
  • How LSTM works;
  • Which data the neural network uses for learning;
  • What kind of neural network architecture was used and the challenges we encountered;
  • What type of results were achieved and how they were evaluated.

Why Predict the Best Date for Sending an Email?

Messages via an email channel help inform customers about new products, reactivate the churn segment or show personalized recommendations. Best Sending Time for emails is different for each customer. Here are some examples: One customer prefers shopping on the weekend, so their Best Sending Time is Saturday. Another customer has recently bought a cat bed, so their Best Sending Time would be as soon as possible for, say, cat food recommendations. The neural network helped us to determine the Best Sending Time for an email and discover a customer’s future needs better.

At the same time, our goal is not to spam or flood customers with messages, but rather to selectively send emails and at the right amount. The algorithm determines to whom and when to send an email so that supply and demand match.

For example, a customer ordered dog food and grooming products from a store. After some time, they will run out of them and have to buy more. In order to not miss an opportunity, the algorithm calculates if and when it should send a reminder. The customer will receive a link in an email, which they’ll be able to use to reorder products. This way, a store can make a well-timed offer, and the customer can restock when they need to while saving time finding the right products on the store’s website.

The Next Best Action is Mindbox’s algorithm to determine the Best Sending Time for an email.

Why Did We Decide to Abandon the Gradient Boosting Algorithm and Choose LSTM Instead?

We first used standard algorithms to help us predict the Best Sending Time for an email. We spent an entire year creating attributes from customers’ action histories and let gradient boosting learn to predict Best Sending Time for an email.
For Example:

  • We calculated how many days would pass between each purchase;
  • We tried to make a classification of attributes and predict the chances of an email being sent on a certain day;
  • We tried to determine a user’s interests based on where they live to increase the likelihood of email openings and clicks.

However, this model didn’t produce consistently positive results across all companies. It couldn’t identify complex patterns in user behavior and it wasn’t able to attract any new inflows of revenue.

At one point we were ready to scrap the whole idea of predicting an email’s Best Sending Time. But then we decided to try something completely different and train an LSTM neural network to see if our goal could still be reached. Usually an LSTM is used for text analysis, rarely for stock price analysis in financial markets, and never for marketing purposes. To our pleasant surprise, the LSTM worked!

What is LSTM?

LSTM (Long Short-Term Memory) is a neural network architecture that comes from natural language analysis.

Let’s take a look at how LSTM works using machine translation as an example. All the letters of the text are fed to the input of the neural network successively, since we want to receive a translation into another language as our output. To translate a text, the network must store information not only about the current letter, but also about those that came before it. On the one hand, an ordinary neural network does not remember what was shown to it before, so is not able to translate the entire word or text. LSTM, on the other hand, has special memory cells where useful information is stored. So, it produces a result based on total aggregated data and translates the text, taking into account all the letters in the words. Over time, the neural network can clear cells and forget information that is no longer needed.

The same principle had crucial application in predicting user actions. The neural network took into account an entire history of actions and produced relevant results, to include determining an email’s Best Sending Time.

The Internal Structure of a Single LSTM Layer:

The Internal Structure of a Single LSTM Layer
The inner layer of an LSTM consists of the operations of addition +, multiplication ×, sigmoid σ and tangent hyperbolic tanh

What Kind of Data Does a Neural Network Use?

We pass both the time elapsed between actions and nine types of tokens within the sequence:

  • Purchase a cheap product,
  • Purchase an average-priced product,
  • Purchase an expensive product,
  • View a cheap product,
  • View average-priced product,
  • View expensive product,
  • Receive an email,
  • Open an email,
  • Click on any object inside the email.

This is a typical example of an input sequence:

(view_medium, 0.5, view_cheap, 24, buy_cheap)

A user with such a sequence looked at an average-priced product, then looked at a cheap product half an hour later, and a day later decided to buy a cheap product.

The user’s last five actions are the target variable. The neural network has now learned to predict a user’s actions.

What Kind of Neural Network Architecture Was Used?

Our first attempts to train the neural network were unsuccessful. It was repeatedly retrained as it always predicted only the sending of an email, but not other actions, such as the probability of opening an email or making a purchase. Since customers receive emails more often than they open them or buy something, a received email is the most frequent token. Following these metrics, the neural network generated good results, even though the final outcome was negative. This is good information, too, as there is no point in an algorithm that reports that a customer has received an email, but reports nothing else.

For example, there is an input sequence of three tokens “email received” and one token “item purchased.” In this case, the neural network processes and predicts a sequence with four “email received” tokens. In 3 out of 4 cases, the neural network will guess and the customer, in fact, does receive an email, but still there is little use for that kind of prediction. The main task is to predict when the customer will open an email and make a purchase.

After testing several architectures and training methods, we found what works.

For Seq2Seq (sequence-to-sequence) architecture, the network consists of two parts: an encoder and a decoder. The encoder is small-scale and consists of an LSTM and embedding layers. The decoder, in addition to an LSTM and embedding layers, uses self-attention and dropout. During training, we use teacher forcing. We sometimes give forecasts to the network, which serves as input for the next prediction.

The encoder basically encodes the input sequence into a vector that contains important information about user actions, according to the network. The decoder, on the other hand, decodes the received vector into a sequence, which gives us the network forecast.

Seq2Seq (Sequence-to-sequence) is a class of machine learning models in which one sequence is transformed into another based on the analysis of past actions.

Obtaining a Prediction Using an LSTM Network

Training Time: the model was trained for about a day

Training Time: the model was trained for about a day on a Tesla V100 and once the training ended, it received a ROC-AUC of 0.74.

How Does the LSTM Model Work with Real Data (Inferences)?

In order to apply the model to a user and find out whether to send them an email or not, we collect a recent action vector and run it through a neural network. Let’s assume, for example, that the neural network’s response was as follows:

(email_show, 10, email_open, 0.5, view_cheap, 0.5 view_medium, 15 buy_medium)

As shown, the model predicts not only specific actions, but also the amount of time that passes between them. Let’s exclude all events that take place after a 24-hour period. Thy will be processed the next day, because during that time there may be new information regarding a customer’s actions, which will need to be taken into account. We receive the following sequence:

(email_show, 10, email_open, 0.5, view_cheap, 0.5)

Since there is a view token in the sequence, the customer will receive an email today.

It is important to send an email only if there is a view or purchase token, and not if there is a token of a received email. That way, the network will not repeat trigger messages previously learned. For example, if you do not take into account viewings and purchases, we could get a sequence with tokens of only a received email. The neural network would then duplicate the marketer’s trigger settings instead of predicting an email opening or purchase:

(email_show, 10, email_show,15, email_show, 0.5)

How Did We Evaluate Results?

As a baseline, we used an algorithm that tracks the average time between user purchases and then sends an email once that time has passed. One half of the users received emails according to baseline decisions, and the other half received emails based on model predictions.

The test lasted two weeks and attained statistical significance. The neural network learned to find 23 times more users to email, while the open rate dropped by only 5%. The number of openings in absolute value increased by 17 times.

The A/B Test Results for the LSTM Neural Network Model and Conclusions

A/B test results

Our experiment using a neural network instead of an algorithm turned out to be successful. The LSTM neural network model proved to be the perfect tool for predicting the Best Sending Time for an email. From this experience, we have learned that we should have confidence in pushing the envelope. Sometimes unconventional resources are the best solution to minor problems.

The following case study is from Mindbox, the original brand behind Maestra’s technology