Customer Segmentation Made Easy with RFM

Why RFM Matters for Customer Insights

In today’s competitive landscape, understanding customer behavior is crucial for sustained success. While many companies track sales and conversion metrics, they often struggle to pinpoint which customers are truly driving growth. Website behavior data can offer insights into actions taken, but it doesn’t always reveal the underlying purchasing patterns that can elevate business outcomes. RFM (Recency, Frequency, Monetary) analysis addresses this gap by segmenting customers based on their buying habits, how recently and frequently they purchase, and how much they spend. This straightforward yet powerful model helps categorize customers into actionable segments, providing a solid foundation for more targeted and effective marketing strategies.

Breaking Down the RFM Model

The RFM model is a powerful tool that helps businesses categorize customers based on their purchasing habits, providing insight into which customers are most valuable and predicting how new customers might behave. By evaluating Recency, Frequency, and Monetary (RFM) factors, companies can identify their best customers and tailor their marketing efforts accordingly.

The model assigns each customer a score, typically from 1 to 5, across three key dimensions:

Recency: This measures how recently a customer made their last purchase. Generally, customers who made a recent purchase are more likely to buy again. The timeframe for measuring recency should align with your business model; for example, a car dealership might consider a customer "recent" if they purchased a car within the last few years, whereas a retail store might look at weeks or months.

Frequency: This evaluates how often a customer makes purchases, helping to identify repeat buyers. High-frequency customers are often brand loyal and more likely to keep shopping with you after their initial purchase, making frequency a crucial metric for gauging long-term engagement.

Monetary: This assesses how much a customer spends over a specified period. While high-value customers may purchase less frequently, their spending on premium products often makes them highly valuable. Understanding monetary value helps businesses recognize which customers are making the most significant contributions to revenue.

By scoring customers across these dimensions, businesses can segment their audience more effectively and prioritize high-value groups for targeted marketing. Most RFM models use a 1–5 scale for scoring, but this can be customized to allow for more nuanced and precise customer segmentation.

Guide to Building Your RFM Model

Preparing Data for RFM Analysis

To ensure accurate RFM (Recency, Frequency, Monetary) analysis, your dataset should contain the following key components:

  1. Customer Identifiers: Include a unique identifier for each customer, such as customer_id to differentiate individual purchase behaviors.

  2. Purchase Transaction Data: Gather essential details for each transaction, including:

    • transaction_id: A unique ID for each transaction.

    • purchase_date: The date the purchase occurred.

    • purchase_amount: The total amount spent in each transaction.

  3. Aggregating Data: Sum the revenue by grouping data by date, customer_id and transaction_id. Ensure your aggregated data has columns for:

    • transaction_date

    • customer_id

    • transaction_id

    • total_revenue (sum of purchase_amount per transaction)

  4. Final Table Example: Your table should look like this:

Set up the RFM model

Step 1: Load and prep the data

This step initializes the dataset by loading and preparing it for RFM calculations:

  • Set Data Types: Specifies data types for key columns, ensuring customer_id and transaction_id are read as strings, which prevents unintended numerical operations.

  • Load and Parse Date Column: Reads data from a CSV file (rfm_data.csv) and parses the date column as a datetime object, which is essential for accurately calculating recency.

  • Create Working Copy: Makes a copy of the original dataset (sales_df) for analysis, preserving the raw data and allowing flexibility in data manipulation without altering the source.

This setup prepares the dataset for efficient processing and calculation of RFM metrics.

import pandas as pd
from datetime import datetime

# Set data types for columns
data_types = {
    'customer_id': str,
    'transaction_id': str
}

# Load the data and parse the date column
sales_data = pd.read_csv('rfm_data.csv', dtype=data_types, 
parse_dates=['date'])
sales_df = sales_data.copy()  # Create a working copy of the original data

Step 2: Select Columns for RFM Analysis

In this step, we filter the dataset to retain only the columns essential for RFM analysis:

  • transaction_id: Unique identifier for each transaction, allowing us to count the number of transactions per customer for frequency calculations.

  • customer_id: Unique identifier for each customer, ensuring each customer’s activity is grouped correctly.

  • date: Transaction date, used to calculate recency by determining how many days have passed since the last purchase.

  • revenue: Transaction revenue amount, used to calculate the total spending per customer for the monetary metric.

By isolating these columns, we streamline the dataset and prepare it specifically for calculating Recency, Frequency, and Monetary values.

# Select relevant columns
columns = ['transaction_id', customer_id, 'date', 'revenue']
df_dataset = sales_df[columns]

Step 3: Calculate RFM Metrics

This step sets up a reference date (typically today's date) for calculating recency and then groups customer data to compute each RFM metric:

  • Recency: For each customer, this is calculated as the number of days since their most recent transaction. A lower recency value indicates recent activity, making these customers more relevant for engagement efforts.

  • Frequency: Counts the total number of transactions each customer has completed, capturing how often they engage. Frequent transactions signal higher engagement or loyalty.

  • Monetary: Sums the total revenue generated by each customer, representing their financial contribution.

By grouping transactions at the customer level, this step creates a foundational RFM dataset where each customer is scored across these key engagement metrics, ready for further analysis and segmentation.

# Define a reference date for recency calculation
today_date = pd.to_datetime(datetime.today().strftime('%Y-%m-%d'))

# Group by user and calculate Recency, Frequency, and Monetary values
rfm_dataset = df_dataset.groupby('user_pseudo_id').agg(
    {
        'date': lambda v: (today_date - v.max()).days,  # Recency
        'transaction_id': 'count',                     # Frequency
        'revenue': 'sum'                               # Monetary
    }
).rename(columns={'date': 'recency', 'transaction_id': 'frequency', 'revenue': 'monetary'})

Step 4: Score Each RFM Metric

Here, we use quantile-based scoring to rank each RFM metric on a scale from 1 to 5, allowing for a standardized comparison across customers:

  • Recency (R): Assigned using quantiles, where a lower recency score (1) indicates more recent activity, while a higher score (5) suggests longer time since last interaction. This helps prioritize recently active customers.

  • Frequency (F): Higher scores (5) represent customers with more frequent purchases, and lower scores (1) indicate infrequent activity. Frequent customers are valuable for loyalty and retention strategies.

  • Monetary (M): Scores are based on spending, with higher scores (5) denoting high-value customers and lower scores (1) reflecting lower spending. This metric helps identify the most financially valuable customers.

The resulting R, F, and M scores are appended to the dataset, creating an RFM profile for each customer that can guide segmentation and targeted marketing strategies.

# Scoring: Assign quantile-based scores for each RFM metric
r = pd.qcut(rfm_dataset['recency'], q=5, labels=range(5, 0, -1))  # Low score = recent activity
f = pd.qcut(rfm_dataset['frequency'], q=5, labels=range(1, 6))    # High score = frequent activity
m = pd.qcut(rfm_dataset['monetary'], q=5, labels=range(1, 6))     # High score = higher spending

# Append the scores to the RFM dataset
rfm = rfm_dataset.assign(R=r.values, F=f.values, M=m.values)

Step 5: Calculate RFM Group and Total Score

In this step, we consolidate the individual RFM scores into two summary metrics.

RFM Group: Combines the Recency (R), Frequency (F), and Monetary (M) scores into a single identifier, such as "5-3-4". This grouping helps to quickly categorize customers based on specific RFM score combinations, making it easier to identify patterns at a glance.

RFM Total Score: Sums the R, F, and M scores into a single numeric value. This aggregate score provides an overall measure of customer value, with higher totals indicating more engaged or valuable customers.

# Combine RFM scores into a single score
rfm['rfm_group'] = rfm[['R', 'F', 'M']].apply(lambda v: '-'.join(v.astype(str)), axis=1)
rfm['rfm_score_total'] = rfm[['R', 'F', 'M']].sum(axis=1)

Step 6: Segment Customers Based on RFM Scores

Bucket customer scores into 10 commonly used industry segments, providing a solid foundation to inform targeted marketing strategies. Each segment represents a unique customer behavior pattern and enables tailored marketing actions—such as rewarding high-value customers or re-engaging those at risk of churning.

These default thresholds can be adjusted based on your business's unique customer profiles and objectives, allowing you to optimize strategies for retention, loyalty, or acquisition efforts.

# Define function to map RFM scores to segments
def map_rfm_segment(row):
    if row['R'] == 5 and row['F'] == 5 and row['M'] == 5:
        return 'Champions'
    elif row['R'] >= 3 and row['F'] == 5 and row['M'] >= 4:
        return 'Loyal Customers'
    elif row['R'] >= 4 and row['F'] >= 4 and row['M'] >= 3:
        return 'Potential Loyalists'
    elif row['R'] == 5 and row['F'] <= 2 and row['M'] <= 2:
        return 'New Customers'
    elif row['R'] == 5 and row['F'] <= 3 and row['M'] <= 3:
        return 'Promising'
    elif row['R'] >= 3 and row['F'] >= 2 and row['M'] >= 2:
        return 'Need Attention'
    elif row['R'] <= 2 and row['F'] >= 4 and row['M'] >= 4:
        return 'At Risk'
    elif row['R'] <= 3 and row['F'] <= 2 and row['M'] == 5:
        return 'Can\'t Lose Them'
    elif row['R'] <= 2 and row['F'] <= 2 and row['M'] <= 2:
        return 'Hibernating'
    else:
        return 'Lost'

Review Results

The final output table will provide a comprehensive view of each customer, including their customer_id, the calculated RFM scores, and the associated group scores. Additionally, the table will include the RFM segment that the customer has been assigned to, which categorizes them based on their behavior (e.g., "Champions," "Loyal Customers," "At Risk," etc.). This segmentation enables quick and effective filtering for downstream marketing strategies and targeted actions. The table will serve as a powerful tool for customer analysis and segmentation, helping to easily identify high-value customers, at-risk segments, and opportunities for engagement.

The final output will look similar to the following:

This output provides a structured overview that is easily interpretable for decision-making and customer lifecycle management.

How to Use RFM Insights for Effective Customer Segmentation

RFM insights and industry standard segments are powerful tools for refining customer segmentation and driving targeted marketing strategies. By leveraging the segments such as "Champions" "Loyal Customers" and "At Risk" businesses can tailor their campaigns to meet the unique needs of each group. For example, "Champions" represent highly engaged, high-value customers who have made recent purchases and spend frequently. These customers are ideal targets for loyalty programs, exclusive offers, or referral incentives to maintain their engagement and encourage advocacy. On the other hand, "At Risk" customers—who have a low recency score and decreasing frequency—can benefit from re-engagement campaigns, special promotions, or personalized messaging aimed at rekindling their interest and reducing churn.

The versatility of RFM segments extends beyond retention strategies; they can be used for acquisition, cross-selling, and upselling. "Potential Loyalists" may be excellent candidates for nurturing, as they show high frequency but lower monetary scores, indicating they are frequent buyers but may not yet be spending at their full potential. Offering upsell opportunities or targeted bundles could increase their spend. Meanwhile, "New Customers" who have recently made their first purchase can be nurtured with onboarding experiences, product recommendations, or educational content to foster long-term loyalty. By using these industry-standard segments to guide specific actions, businesses can prioritize their resources more effectively, maximize customer lifetime value, and create more personalized, impactful customer experiences.

Conclusion

The key takeaway from RFM models is their ability to provide actionable insights that drive data-driven marketing decisions. By identifying high-value customers and predicting future behavior, businesses can effectively prioritize resources for retention, acquisition, and growth. RFM segmentation allows businesses to personalize their approach, whether nurturing new customers, rewarding loyalty, or re-engaging at-risk segments. Ultimately empowering organizations to build stronger customer relationships and accelerate long-term growth.

If you’d like to learn more about RFM and how you can leverage this model for your organization contact Lakefront by selecting Get in Touch below!

Next
Next

Mastering Data Unification: The Deterministic and Probabilistic Approaches in CDP