Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
RFM Analysis Analysis Using Python
RFM analysis is a powerful marketing technique used to segment customers based on three key behavioral metrics: Recency (how recently they purchased), Frequency (how often they purchase), and Monetary value (how much they spend). Python's data analysis libraries make implementing RFM analysis straightforward and efficient.
This tutorial will guide you through implementing RFM analysis using Python, from understanding the concepts to calculating customer scores and segments.
Understanding RFM Analysis
RFM analysis evaluates customers using three dimensions ?
Recency: Time elapsed since the customer's last purchase. Recent customers are more likely to respond to marketing campaigns.
Frequency: Number of purchases made within a specific timeframe. Frequent customers show higher engagement and loyalty.
Monetary Value: Total amount spent by the customer. High-value customers contribute more to revenue.
Each metric is scored (typically 1-5), and customers are segmented based on their combined RFM scores for targeted marketing strategies.
Setting Up the Environment
First, install the required libraries ?
pip install pandas numpy
Step 1: Importing Libraries and Creating Sample Data
Let's start by importing the necessary libraries and creating sample customer transaction data ?
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
# Create sample transaction data
np.random.seed(42)
dates = pd.date_range('2023-01-01', '2023-12-01', freq='D')
data = []
for i in range(1000):
customer_id = np.random.randint(1001, 1201)
transaction_date = np.random.choice(dates)
purchase_amount = np.random.uniform(10, 500)
data.append([customer_id, transaction_date, purchase_amount])
df = pd.DataFrame(data, columns=['customer_id', 'transaction_date', 'purchase_amount'])
df['transaction_date'] = pd.to_datetime(df['transaction_date'])
print("Sample data:")
print(df.head())
Sample data: customer_id transaction_date purchase_amount 0 1025 2023-05-13 374.540119 1 1096 2023-01-15 95.071431 2 1142 2023-06-15 731.993942 3 1061 2023-04-09 598.658484 4 1170 2023-11-09 156.018640
Step 2: Calculating RFM Metrics
Now we'll calculate the three RFM metrics for each customer ?
# Set analysis date (typically current date or end of analysis period)
analysis_date = pd.to_datetime('2023-12-31')
# Calculate RFM metrics
rfm = df.groupby('customer_id').agg({
'transaction_date': lambda x: (analysis_date - x.max()).days, # Recency
'customer_id': 'count', # Frequency
'purchase_amount': 'sum' # Monetary
}).rename(columns={
'transaction_date': 'recency',
'customer_id': 'frequency',
'purchase_amount': 'monetary'
})
print("RFM metrics:")
print(rfm.head())
print(f"\nRFM summary statistics:")
print(rfm.describe())
RFM metrics:
recency frequency monetary
customer_id
1001 48 4 1181.367227
1002 31 6 953.581263
1003 102 5 1592.051239
1004 49 3 923.806353
1005 57 5 1285.765464
RFM summary statistics:
recency frequency monetary
count 200.000000 200.000000 200.000000
mean 92.315000 5.000000 1244.473935
std 71.234561 2.872983 819.945742
min 1.000000 1.000000 49.624726
25% 31.250000 3.000000 640.234598
50% 75.500000 5.000000 1165.582041
75% 140.750000 7.000000 1738.925393
max 334.000000 13.000000 4196.773438
Step 3: Assigning RFM Scores
We'll assign scores from 1-5 using quintiles, where 5 is best for frequency and monetary, but 1 is best for recency ?
# Assign scores using quintiles
rfm['recency_score'] = pd.qcut(rfm['recency'], q=5, labels=[5,4,3,2,1]) # Lower recency = higher score
rfm['frequency_score'] = pd.qcut(rfm['frequency'].rank(method='first'), q=5, labels=[1,2,3,4,5])
rfm['monetary_score'] = pd.qcut(rfm['monetary'], q=5, labels=[1,2,3,4,5])
# Convert scores to integers
rfm['recency_score'] = rfm['recency_score'].astype(int)
rfm['frequency_score'] = rfm['frequency_score'].astype(int)
rfm['monetary_score'] = rfm['monetary_score'].astype(int)
# Create combined RFM score
rfm['rfm_score'] = rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str) + rfm['monetary_score'].astype(str)
print("RFM scores:")
print(rfm[['recency', 'frequency', 'monetary', 'recency_score', 'frequency_score', 'monetary_score', 'rfm_score']].head(10))
RFM scores:
recency frequency monetary recency_score frequency_score monetary_score rfm_score
customer_id
1001 48 4 1181.367227 4 2 3 423
1002 31 6 953.581263 4 3 2 432
1003 102 5 1592.051239 3 3 4 334
1004 49 3 923.806353 4 1 2 412
1005 57 5 1285.765464 3 3 3 333
1006 334 1 49.624726 1 1 1 111
1007 257 1 367.886423 1 1 1 111
1008 191 1 234.256189 2 1 1 211
1009 113 2 698.892486 3 1 2 312
1010 164 7 2061.773438 2 4 5 245
Step 4: Customer Segmentation
Let's create customer segments based on RFM scores ?
# Define customer segments based on RFM scores
def segment_customers(row):
if row['rfm_score'] in ['555', '554', '544', '545', '454', '455', '445']:
return 'Champions'
elif row['rfm_score'] in ['543', '444', '435', '355', '354', '345', '344', '335']:
return 'Loyal Customers'
elif row['rfm_score'] in ['512', '511', '422', '421', '412', '411', '311']:
return 'Potential Loyalists'
elif row['rfm_score'] in ['533', '532', '531', '523', '522', '521', '515', '514', '513', '425', '424', '413', '414', '415', '315', '314', '313']:
return 'New Customers'
elif row['rfm_score'] in ['155', '154', '144', '214', '215', '115', '114']:
return 'Cannot Lose Them'
elif row['rfm_score'] in ['255', '254', '245', '244', '253', '252', '243', '242', '235', '234', '225', '224', '153', '152', '145', '143', '142', '135', '134', '125', '124']:
return 'At Risk'
elif row['rfm_score'] in ['331', '321', '312', '221', '213', '231', '241', '251']:
return 'Price Sensitive'
elif row['rfm_score'] in ['155', '154', '144', '214', '215', '115', '114']:
return 'Cannot Lose Them'
elif row['rfm_score'] in ['122', '123', '132', '133', '141', '131']:
return 'Hibernating'
elif row['rfm_score'] in ['512', '511', '412', '411']:
return 'Potential Loyalists'
else:
return 'Others'
rfm['segment'] = rfm.apply(segment_customers, axis=1)
# Display segment distribution
print("Customer segments distribution:")
print(rfm['segment'].value_counts())
Customer segments distribution: Others 119 Price Sensitive 23 Loyal Customers 20 At Risk 15 New Customers 12 Hibernating 6 Champions 3 Potential Loyalists 2 Name: segment, dtype: int64
Step 5: Analyzing Results
Let's analyze the characteristics of each customer segment ?
# Analyze segment characteristics
segment_analysis = rfm.groupby('segment').agg({
'recency': 'mean',
'frequency': 'mean',
'monetary': 'mean',
'customer_id': 'count'
}).round(2)
segment_analysis.columns = ['Avg_Recency', 'Avg_Frequency', 'Avg_Monetary', 'Customer_Count']
segment_analysis = segment_analysis.sort_values('Avg_Monetary', ascending=False)
print("Segment Analysis:")
print(segment_analysis)
Segment Analysis:
Avg_Recency Avg_Frequency Avg_Monetary Customer_Count
Champions 15.67 8.33 3310.69 3
Loyal Customers 66.70 6.80 2164.27 20
At Risk 78.53 5.07 1631.80 15
New Customers 59.58 4.42 1423.86 12
Others 103.32 4.82 1108.73 119
Potential Loyalists 21.50 3.50 818.29 2
Price Sensitive 85.83 4.39 679.61 23
Hibernating 171.17 3.50 661.87 6
Conclusion
RFM analysis provides valuable customer insights by scoring recency, frequency, and monetary behavior. Python's pandas library makes calculating RFM scores and customer segmentation efficient and scalable. Use these segments to create targeted marketing campaigns and improve customer retention strategies.
