Customer Purchasing Behavior Analysis - Problem

You're building a comprehensive customer analytics system for an e-commerce platform! Given two database tables containing transaction history and product information, you need to analyze customer purchasing behavior and calculate key metrics that drive business decisions.

Your task is to generate a detailed customer profile for each customer that includes:

  • Financial Metrics: Total spending, average transaction amount
  • Engagement Metrics: Number of transactions, unique product categories explored
  • Behavioral Insights: Most frequently purchased category (with recency tiebreaker)
  • Loyalty Score: A custom formula combining transaction frequency and spending: (transactions ร— 10) + (total_amount รท 100)

The challenge lies in handling tied categories - when a customer purchases equally from multiple categories, you must select the category from their most recent transaction. Results should be ordered by loyalty score (descending), then by customer ID (ascending).

This problem mimics real-world customer segmentation and loyalty program analysis used by companies like Amazon, Netflix, and Spotify!

Input & Output

example_1.sql โ€” Basic Customer Analysis
$ Input: Transactions: [(1,101,1,'2023-01-01',100.00), (2,101,2,'2023-01-15',150.00), (5,101,3,'2023-02-10',200.00)]\nProducts: [(1,'A',100.00), (2,'B',150.00), (3,'C',200.00)]
โ€บ Output: customer_id=101, total_amount=450.00, transaction_count=3, unique_categories=3, avg_transaction_amount=150.00, top_category='C', loyalty_score=34.50
๐Ÿ’ก Note: Customer 101 has 3 transactions totaling $450. Since each category (A,B,C) has 1 purchase, we choose category C as it has the most recent transaction (2023-02-10). Loyalty score = (3ร—10) + (450รท100) = 34.50
example_2.sql โ€” Tie-Breaking Scenario
$ Input: Transactions: [(3,102,1,'2023-01-01',100.00), (4,102,3,'2023-01-22',200.00)]\nProducts: [(1,'A',100.00), (3,'C',200.00)]
โ€บ Output: customer_id=102, total_amount=300.00, transaction_count=2, unique_categories=2, avg_transaction_amount=150.00, top_category='C', loyalty_score=23.00
๐Ÿ’ก Note: Customer 102 has equal purchases in categories A and C (1 each). The tie is broken by recency - category C was purchased on 2023-01-22 vs category A on 2023-01-01. Loyalty score = (2ร—10) + (300รท100) = 23.00
example_3.sql โ€” Single Category Customer
$ Input: Transactions: [(6,103,1,'2023-01-05',100.00), (7,103,1,'2023-01-10',100.00)]\nProducts: [(1,'A',100.00)]
โ€บ Output: customer_id=103, total_amount=200.00, transaction_count=2, unique_categories=1, avg_transaction_amount=100.00, top_category='A', loyalty_score=22.00
๐Ÿ’ก Note: Customer 103 only purchases from category A (2 transactions). No tie-breaking needed. Loyalty score = (2ร—10) + (200รท100) = 22.00

Visualization

Tap to expand
Raw DataTransactions+ ProductsEnrichmentJOIN tablesAdd categoriesPattern AnalysisWindow FunctionsCategory RankingTie BreakingFrequency + RecencyMost Recent WinsFinal MetricsLoyalty ScoreAll AnalyticsCustomer101Category A: 1ร—Category B: 1ร—Category C: 1ร— โœ“Top CategoryC (Most Recent)Loyalty Score(3 ร— 10) + (450/100)= 34.50Customer Analytics PipelineSingle-Pass Processing with Window Functions
Understanding the Visualization
1
Data Enrichment
JOIN transaction records with product catalog to add category information - like linking receipts to store departments
2
Pattern Analysis
Use window functions to analyze shopping patterns: frequency of category purchases and recency of visits
3
Tie-Breaking Logic
When customers shop equally across categories, choose their most recent preference - mimicking real customer behavior
4
Loyalty Scoring
Calculate composite loyalty score combining transaction frequency (engagement) and spending (value) - like airline miles systems
Key Takeaway
๐ŸŽฏ Key Insight: Use window functions with proper ORDER BY clauses to handle tie-breaking elegantly - ORDER BY frequency DESC, recency DESC ensures the most frequently purchased category wins, with most recent transaction breaking ties. This mimics real customer preference analysis!

Time & Space Complexity

Time Complexity
โฑ๏ธ
O(n log n)

Single scan of joined tables (n) plus sorting for window functions and final ORDER BY

n
2n
โšก Linearithmic
Space Complexity
O(k)

Where k is the number of unique customers for intermediate window function results

n
2n
โœ“ Linear Space

Constraints

  • 1 โ‰ค transaction_id โ‰ค 106
  • 1 โ‰ค customer_id โ‰ค 104
  • 1 โ‰ค product_id โ‰ค 103
  • 0.01 โ‰ค amount โ‰ค 104
  • transaction_date is in 'YYYY-MM-DD' format
  • Each customer will have at least 1 transaction
  • Product categories are non-empty varchar strings
  • All amounts should be rounded to 2 decimal places
Asked in
Amazon 85 Meta 72 Google 68 Netflix 45 Spotify 38
54.2K Views
High Frequency
~25 min Avg. Time
1.8K Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen