Everybody is able to list several high-level customer segments that are relevant to their business. There are always loyal customers, customers who structurally hunt for the best deals, customers who have not been active for quite a while, to just name a few. Although we know these customers exist, we do not know who these customers are on an individual level. We do not know who our loyal customers are, and more importantly, we do not know how many high-level customer segments exist for our business. By smartly using your data, these challenges can be overcome by combining a clustering technique in combination with an RFM model.
A RFM model is popular in marketing for segmentation because it is a simple and effective approach to create segments based on buying behaviour (Link). Many more characteristics could be considered to create segments but since our goal is to create high-level segments we primarily use the most important metric for most businesses: transactions. In more detail, the transactional data used in the model consists of the following three variables:
“Recency” – the number of days since the last purchase
“Frequency” – the total number of purchases
“Monetary” – the customer life-time value (CLV). This could simply be total turnover or a fancier definition
These raw RFM values make it difficult to create segments because there are many different values for all three variables. To simplify this, RFM scores are assigned to each customer. This means that each customer gets a “Recency” score between 1-5 in which 5 is very recent and 1 very long ago. The same principle is used for the other two variables (see figure 1). The thresholds for these RFM score are determined using a binning algorithm, which thrives to create 5 groups that are as equal as possible. Furthermore, at this point we already simplified the segmentation possibility but there are still 125 (5x5x5) different segments. To reduce this number of segments we apply a clustering technique that creates fewer segments using a data driven approach.
A powerful unsupervised machine learning technique is clustering, which is focused on creating natural existing groups based on similar patterns in the data. The algorithm will make clusters that are very different from each other in terms of buying behaviour but on the other hand make sure that the people in the same cluster are very similar. We as humans are not able to determine how many segments exists when we are facing large datasets. Luckily, this can be determined in a data driven way using the elbow method. In more detail, the elbow method plots the difference between the clusters (see blue line figure 2) and the difference within the clusters (see red line figure 2) and enable you to select the ideal number of clusters. In this case, we selected 7 as the ideal number of clusters since the blue line is high and the red line is low. Finally, we can let the k-means algorithm create the 7 clusters and assign the cluster number back to each individual customer.
Figure 2: Elbow method used to determine the number of K-means clusters
In-depth Cluster Analyses
Now we have created 7 customer segments that are naturally existing, we need to define what type of customers each cluster number is. This step entails blending your business knowledge, the segments you think exist, and the data interpretation to define the customer segments. The first step is to get an understanding about the size of each cluster, which is easily obtained from the graph below.
For example, cluster 4 are customers that purchased very recent, very frequent, and have a large CLV and are clearly the loyal customers. On the contrary, cluster 7 purchased recently, very frequent, but have a very negative customer lifetime value, these are the wardrobers (return product often, hence the negative CLV). These interpretation steps are conducted for all cluster numbers. Additionally, a high-level marketing action is defined that corresponds to each customer segments that is integrated in the overall cluster strategy. See figure 4 for an example output of the customer segments.
A sleeping customer can obviously become a loyal champion when he is triggered by your campaigns and starts purchasing again. This change of purchasing behaviour directly affect the segment in which this customer belongs and therefore has to be updated. In other words, new purchases of existing customers and new customers in general need to be assigned/reassigned to their corresponding segment. This step is crucial since it keeps the segments you are using for you marketing strategy up-to-date. In more detail, we manage this updating process by obtaining all new transactions from the CRM on a weekly basis and consequently determine if each individual customer has the correct customer segment label. This is determined using a classification tree, which summarizes the buying behaviour conditions per individual customer segment. When the new buying behaviour data is processed, all email lists in the advertising platforms are updated using the available API’s to eliminate manual work.
Any questions? Or do you want to know more about working at Expand Online? Check with Job, he can tell you all about it.