Ever since the words Machine Learning were first uttered here at Mindbox, we’ve been plotting to build a Big Green Button. That’s a big button that takes up the whole screen, and when you press it everything just starts whirring and spinning and making money all by itself. Our RFM analysis tool isn't a Big Green Button, but it's a step in that direction. It’s only a Little Green Button, but it does some very clever stuff when you press it. Like automatically segmenting your customer database for targeted email campaigns...
Green button
Our little green button
To make it work, we built ourselves an automated RFM segmentation tool and designed a special report to display its results in a clear and intuitive format. This is the story of how that all came about, and why you can now do away with analysts and spend more time on the things that really matter.

What is RFM analysis

The results of an email campaign depend on the size of your audience and the quality of the campaign itself. You can’t keep increasing your audience size forever, so at some point you have to increase your quality. This means personalizing your campaigns, because everyone is an individual and we all want different things.
Segment and conquer
Segment and conquer
Creating an individual email for every customer is out of the question because we tend to have an awful lot of them, and that’s why marketers group customers into segments. RFM analysis is one of many ways of segmenting a customer database. So what’s special about the segments that RFM analysis produces? RFM segments are non-overlapping groups of customers. In RFM analysis we evaluate each customer according to three metrics, or dimensions:
  • R (Recency) — how long ago the customer last made a purchase.
  • F (Frequency) — how frequently they make purchases.
  • M (Monetary) — how much they spend.
Many marketing agencies use RFM analysis as a matter of course, and we’re no exception.

RFM analysis methods

The various approaches to RFM analysis are essentially very similar. For each of the RFM dimensions, customers are divided into groups (usually no more than five). The intersections of those groups are our segments.
RFM-based segmentation
A 3D representation of customer segmentation by RFM analysis. Each of the dimensions is split into three groups, and some groups are further broken down into subgroups.
If we were to break each of the three dimensions down into four groups we would get 64 (4x4x4) customer segments. If we used five groups we would have 125 segments. The biggest challenge is to define the boundaries of the groups, as there are no hard and fast rules as to how this should be done. Let’s compare the most popular methods of doing this using a typical customer database as an example:
Example customer distribution chart
An example distribution of customers based on spend (M) and recency of their last purchase (R).
For ease of interpretation we’re using just two dimensions, R and M. In our example:
  • Total spend is between 0 and 300 dollars.
  • Last purchase is between one hour and 240 days

Method 1. Division by equal range

In this method we create our groups by dividing our chart axes into ranges of equal size. In this case we create three spend ranges: from 0 to 100; from 100 to 200; and from 200 to 300 dollars. We also divide the recency axis into three ranges: up to 80 days; from 80 to 160 days; and from 160 days upwards. This gives us nine segments.
Division by equal range
In this approach, the majority of our customers are in segment in the bottom-left corner.
Advantages:
  • Easy to automate.
  • Identifies “extreme” groups - the biggest spenders, the most frequent buyers, and those who have not purchased for the longest time.
Disadvantages:
  • Uneven distribution of customers among segments: in our example, 86% of all our customers are grouped in one segment, 13% are in another and the remaining 1% are scattered among the remaining seven segments.
  • Same number of groups for each dimension.
  • Lots of segments (even if we just divide each of the three dimensions by 3, we get 27 segments).

Method 2. Distribution by equal number of customers

In this method, we create our groups such that each segment contains the same number of customers. Using the same data as before, and still with three ranges on each axis, we get the following segments:
Distribution by equal number of customers
With division by equal numbers of customers, we can have customers who've spent $20 occupying the same segment as customers who've spent $300
Advantages:
  • Easy to automate.
  • Generally no serious imbalance between groups.
Disadvantages:
  • Doesn’t identify “special” clients. In our example above we have customers worth $20 in the same segment as customers worth $300, so the “big spenders” are not isolated in their own segment like in the first method.
  • Same number of groups for each dimension.
  • Lots of segments.

Method 3. Do it manually

An analyst studies the database and decides on the best way to split things up. Advantages:
  • Good segmentation.
Disadvantages:
  • Requires an analyst.
  • Takes a lot of time.

One-button RFM analysis

We decided to do away with all the disadvantages of these methods, and that's where the machine learning algorithms come in. We use clustering to determine automatically how many customer segments are in the database and what those segments are. A decision tree process then tidies the results up visually, assigning outliers to appropriate segments. Running this on our sample database gave us this result:
Machine Learning
We designed a report to go with it that (we hope) clearly explains the results in terms of what marketers need to know. The report is generated with a click of a button, consists of three tables and fits on a single page.

Part 1. Database health report

The first table summarises information about all the segments extracted by the RFM analysis. Key metrics for each segment are customer activity level and value. Activity level is based on the recency of the customer's last purchase, and value is based on the amount spent. Each segment falls into one of a number of categories. Each category can contain any number of segments, or none at all. The table cells show the total number of customers in all segments in the category.
Nine segment categories
The Activity Level and Value metrics give us nine segment categories, plus the Never Purchased category
Note: “Lapsed” and “Lapse risk” as used here mean “Customers who have not purchased in a long time” and “Customers whose last purchase recency corresponds to the average” respectively, and do not indicate “lapsing” in the traditional sense of the word. Similarly, “Active” means “Customers who recently made a purchase”. In our example more than 80% of customers in the database have not made any purchases (we'll continue to refer to them as “customers” for simplicity). Almost a third of the high-value customers have lapsed and approximately another third are at risk of lapsing. The database health check helps us identify the most important category to work with. To demonstrate how to use the report, let's take a closer look at the high-value customers (i.e. those who spent the most money).

Part 2. Analysing segments

The second table shows the size of each segment, its revenue (the sum value of all purchases made by customers in that segment) and the average spend per purchase. All the segments are presented in a list. Here's a list of all the segments containing only customers that made purchases:
12 customer segments
We have 12 segments of purchasing customers in our database
We can add a filter to show only high-value customers.
Filter configuration
Filtering results to show only high-value customer segments
After filtering we're left with seven high-value customer segments.
Filtered results
Filtered results showing only high-value customer segments
Now we can make some important conclusions. Segment #2 accounts for significantly more revenue than the others, despite having a fairly modest average spend. We can surmise that customers in this segment are very loyal and make a lot of purchases on a regular basis. Without having to worry about them lapsing we can safely send them emails about, for example, new arrivals. Now let's look at the average spend. Segement #7 has the largest average spend and is categorised as having lapsed. Segment #9 has the second highest average spend and is at risk of lapsing. Customers in these segments are willing to make large purchases, but haven't done so in a long time. It might make sense to try and reactivate them by sending them a discount code or a newsletter. With proper analysis we can identify which segments deserve the most attention and effort.

Part 3. Detailed summary

The last table shows the segment boundaries and average value on each of the R, F and M dimensions.
Segment drilldown
The detailed summary provides additional information about selected segments. We can see here that customers in segment #2 each made an average of 12 purchases, much more than in other segments.
Now we decide which segment we want to focus on first. Let's start with the segments that have the largest average spend - segments #7 and #9. Customers in segment #7 haven't made a purchase in almost a year, so getting them back won't be easy. However, since the average number of purchases in this segment is 2.1 we can assume that they weren't disappointed with their first purchase. A nice discount may well bring them back. Segment #9 will be easier to deal with. The average purchase recency here is just three months, and customers in this segment made an average of 2.8 purchases. It's safe to assume that these customers are fairly loyal and don't need any special treatment, but an email with an advert or a small discount to keep our brand fresh in their mind wouldn't go amiss. Once we've selected the segments we want to work with and decided what to do with them, we can get down to launching some campaigns.

The real Big Green Button is on its way

We created an automated RFM segmentation tool and we're happy with the results. It now only takes 20 seconds of our time to segment our database with optimal distribution among segments. Our next goal is to save even more of your time by automatically configuring marketing campaigns for those segments. We'll be sad to say farewell to our RFM report (nobody will need it any more) but we can't let sentiment stand in the way of progress.

The Mindbox team

Emile Feldman Machine learning specialist
Lana Shakirova Content marketer
The following case study is from Mindbox, the original brand behind Maestra’s technology