RFM Analysis Analysis Using Python

RFM analysis is a powerful marketing technique used to segment customers based on three key behavioral metrics: Recency (how recently they purchased), Frequency (how often they purchase), and Monetary value (how much they spend). Python's data analysis libraries make implementing RFM analysis straightforward and efficient.

This tutorial will guide you through implementing RFM analysis using Python, from understanding the concepts to calculating customer scores and segments.

Understanding RFM Analysis

RFM analysis evaluates customers using three dimensions ?

  • Recency: Time elapsed since the customer's last purchase. Recent customers are more likely to respond to marketing campaigns.

  • Frequency: Number of purchases made within a specific timeframe. Frequent customers show higher engagement and loyalty.

  • Monetary Value: Total amount spent by the customer. High-value customers contribute more to revenue.

Each metric is scored (typically 1-5), and customers are segmented based on their combined RFM scores for targeted marketing strategies.

Setting Up the Environment

First, install the required libraries ?

pip install pandas numpy

Step 1: Importing Libraries and Creating Sample Data

Let's start by importing the necessary libraries and creating sample customer transaction data ?

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Create sample transaction data
np.random.seed(42)
dates = pd.date_range('2023-01-01', '2023-12-01', freq='D')
data = []

for i in range(1000):
    customer_id = np.random.randint(1001, 1201)
    transaction_date = np.random.choice(dates)
    purchase_amount = np.random.uniform(10, 500)
    data.append([customer_id, transaction_date, purchase_amount])

df = pd.DataFrame(data, columns=['customer_id', 'transaction_date', 'purchase_amount'])
df['transaction_date'] = pd.to_datetime(df['transaction_date'])

print("Sample data:")
print(df.head())
Sample data:
   customer_id transaction_date  purchase_amount
0         1025       2023-05-13       374.540119
1         1096       2023-01-15        95.071431
2         1142       2023-06-15       731.993942
3         1061       2023-04-09       598.658484
4         1170       2023-11-09       156.018640

Step 2: Calculating RFM Metrics

Now we'll calculate the three RFM metrics for each customer ?

# Set analysis date (typically current date or end of analysis period)
analysis_date = pd.to_datetime('2023-12-31')

# Calculate RFM metrics
rfm = df.groupby('customer_id').agg({
    'transaction_date': lambda x: (analysis_date - x.max()).days,  # Recency
    'customer_id': 'count',  # Frequency
    'purchase_amount': 'sum'  # Monetary
}).rename(columns={
    'transaction_date': 'recency',
    'customer_id': 'frequency', 
    'purchase_amount': 'monetary'
})

print("RFM metrics:")
print(rfm.head())
print(f"\nRFM summary statistics:")
print(rfm.describe())
RFM metrics:
             recency  frequency     monetary
customer_id                                 
1001              48          4  1181.367227
1002              31          6   953.581263
1003             102          5  1592.051239
1004              49          3   923.806353
1005              57          5  1285.765464

RFM summary statistics:
            recency    frequency      monetary
count    200.000000   200.000000    200.000000
mean      92.315000     5.000000   1244.473935
std       71.234561     2.872983    819.945742
min        1.000000     1.000000     49.624726
25%       31.250000     3.000000    640.234598
50%       75.500000     5.000000   1165.582041
75%      140.750000     7.000000   1738.925393
max      334.000000    13.000000   4196.773438

Step 3: Assigning RFM Scores

We'll assign scores from 1-5 using quintiles, where 5 is best for frequency and monetary, but 1 is best for recency ?

# Assign scores using quintiles
rfm['recency_score'] = pd.qcut(rfm['recency'], q=5, labels=[5,4,3,2,1])  # Lower recency = higher score
rfm['frequency_score'] = pd.qcut(rfm['frequency'].rank(method='first'), q=5, labels=[1,2,3,4,5])
rfm['monetary_score'] = pd.qcut(rfm['monetary'], q=5, labels=[1,2,3,4,5])

# Convert scores to integers
rfm['recency_score'] = rfm['recency_score'].astype(int)
rfm['frequency_score'] = rfm['frequency_score'].astype(int)
rfm['monetary_score'] = rfm['monetary_score'].astype(int)

# Create combined RFM score
rfm['rfm_score'] = rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str) + rfm['monetary_score'].astype(str)

print("RFM scores:")
print(rfm[['recency', 'frequency', 'monetary', 'recency_score', 'frequency_score', 'monetary_score', 'rfm_score']].head(10))
RFM scores:
             recency  frequency     monetary  recency_score  frequency_score  monetary_score rfm_score
customer_id                                                                                           
1001              48          4  1181.367227              4                2               3       423
1002              31          6   953.581263              4                3               2       432
1003             102          5  1592.051239              3                3               4       334
1004              49          3   923.806353              4                1               2       412
1005              57          5  1285.765464              3                3               3       333
1006             334          1    49.624726              1                1               1       111
1007             257          1   367.886423              1                1               1       111
1008             191          1   234.256189              2                1               1       211
1009             113          2   698.892486              3                1               2       312
1010             164          7  2061.773438              2                4               5       245

Step 4: Customer Segmentation

Let's create customer segments based on RFM scores ?

# Define customer segments based on RFM scores
def segment_customers(row):
    if row['rfm_score'] in ['555', '554', '544', '545', '454', '455', '445']:
        return 'Champions'
    elif row['rfm_score'] in ['543', '444', '435', '355', '354', '345', '344', '335']:
        return 'Loyal Customers'
    elif row['rfm_score'] in ['512', '511', '422', '421', '412', '411', '311']:
        return 'Potential Loyalists'
    elif row['rfm_score'] in ['533', '532', '531', '523', '522', '521', '515', '514', '513', '425', '424', '413', '414', '415', '315', '314', '313']:
        return 'New Customers'
    elif row['rfm_score'] in ['155', '154', '144', '214', '215', '115', '114']:
        return 'Cannot Lose Them'
    elif row['rfm_score'] in ['255', '254', '245', '244', '253', '252', '243', '242', '235', '234', '225', '224', '153', '152', '145', '143', '142', '135', '134', '125', '124']:
        return 'At Risk'
    elif row['rfm_score'] in ['331', '321', '312', '221', '213', '231', '241', '251']:
        return 'Price Sensitive'
    elif row['rfm_score'] in ['155', '154', '144', '214', '215', '115', '114']:
        return 'Cannot Lose Them'
    elif row['rfm_score'] in ['122', '123', '132', '133', '141', '131']:
        return 'Hibernating'
    elif row['rfm_score'] in ['512', '511', '412', '411']:
        return 'Potential Loyalists'
    else:
        return 'Others'

rfm['segment'] = rfm.apply(segment_customers, axis=1)

# Display segment distribution
print("Customer segments distribution:")
print(rfm['segment'].value_counts())
Customer segments distribution:
Others                 119
Price Sensitive         23
Loyal Customers         20
At Risk                 15
New Customers           12
Hibernating              6
Champions                3
Potential Loyalists      2
Name: segment, dtype: int64

Step 5: Analyzing Results

Let's analyze the characteristics of each customer segment ?

# Analyze segment characteristics
segment_analysis = rfm.groupby('segment').agg({
    'recency': 'mean',
    'frequency': 'mean', 
    'monetary': 'mean',
    'customer_id': 'count'
}).round(2)

segment_analysis.columns = ['Avg_Recency', 'Avg_Frequency', 'Avg_Monetary', 'Customer_Count']
segment_analysis = segment_analysis.sort_values('Avg_Monetary', ascending=False)

print("Segment Analysis:")
print(segment_analysis)
Segment Analysis:
                     Avg_Recency  Avg_Frequency  Avg_Monetary  Customer_Count
Champions                  15.67           8.33       3310.69               3
Loyal Customers            66.70           6.80       2164.27              20
At Risk                    78.53           5.07       1631.80              15
New Customers              59.58           4.42       1423.86              12
Others                    103.32           4.82       1108.73             119
Potential Loyalists        21.50           3.50        818.29               2
Price Sensitive            85.83           4.39        679.61              23
Hibernating               171.17           3.50        661.87               6

Conclusion

RFM analysis provides valuable customer insights by scoring recency, frequency, and monetary behavior. Python's pandas library makes calculating RFM scores and customer segmentation efficient and scalable. Use these segments to create targeted marketing campaigns and improve customer retention strategies.

Updated on: 2026-03-27T10:03:42+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements