Find Category Recommendation Pairs - Problem
Amazon's Category Cross-Shopping Analytics Challenge

Amazon wants to identify cross-category shopping patterns to improve their recommendation engine. Given two tables containing customer purchase data and product information, your task is to find pairs of product categories that show strong customer overlap.

Your Mission:
• Find all category pairs where category1 < category2 (alphabetically)
• Count unique customers who purchased from both categories in each pair
• Only report pairs with at least 3 customers shopping across both categories

Input Tables:
ProductPurchases: Contains user purchases with quantities
ProductInfo: Maps products to categories and prices

Output: Category pairs ordered by customer count (descending), then alphabetically by category names.

This data helps Amazon understand which product categories naturally complement each other in customer shopping behavior!

Input & Output

example_1.sql — Basic Cross-Category Analysis
$ Input: ProductPurchases: | user_id | product_id | quantity | |---------|------------|----------| | 1 | 101 | 2 | | 1 | 201 | 1 | | 2 | 101 | 1 | | 2 | 301 | 3 | | 3 | 201 | 2 | | 3 | 301 | 1 | | 4 | 101 | 1 | | 4 | 201 | 2 | | 4 | 301 | 1 | ProductInfo: | product_id | category | price | |------------|-------------|-------| | 101 | Electronics | 299.99| | 201 | Books | 19.99 | | 301 | Clothing | 49.99 |
Output: | category1 | category2 | customer_count | |-------------|-------------|----------------| | Books | Electronics | 2 | | Books | Clothing | 2 |
💡 Note: Users 1 and 4 bought from both Books and Electronics (count=2). Users 2, 3, and 4 represent cross-category shopping, but Electronics-Clothing only has user 4 (count=1 < 3), so it's filtered out.
example_2.sql — Multiple Category Overlaps
$ Input: ProductPurchases: | user_id | product_id | quantity | |---------|------------|----------| | 1 | 101 | 1 | | 1 | 201 | 1 | | 1 | 301 | 1 | | 2 | 101 | 2 | | 2 | 201 | 1 | | 3 | 101 | 1 | | 3 | 301 | 2 | | 4 | 201 | 1 | | 4 | 301 | 1 | | 5 | 101 | 1 | | 5 | 201 | 2 | | 6 | 101 | 1 | | 6 | 301 | 1 | ProductInfo: | product_id | category | price | |------------|-------------|-------| | 101 | Electronics | 199.99| | 201 | Books | 24.99 | | 301 | Clothing | 79.99 |
Output: | category1 | category2 | customer_count | |-------------|-------------|----------------| | Books | Electronics | 3 | | Clothing | Electronics | 3 |
💡 Note: Books-Electronics: users 1, 2, 5 (count=3). Clothing-Electronics: users 1, 3, 6 (count=3). Books-Clothing: users 1, 4 (count=2 < 3, filtered out). Results ordered by count DESC, then alphabetically.
example_3.sql — Edge Case: No Valid Pairs
$ Input: ProductPurchases: | user_id | product_id | quantity | |---------|------------|----------| | 1 | 101 | 1 | | 2 | 201 | 1 | | 3 | 301 | 1 | | 4 | 101 | 2 | | 5 | 201 | 1 | ProductInfo: | product_id | category | price | |------------|-------------|-------| | 101 | Electronics | 399.99| | 201 | Books | 14.99 | | 301 | Clothing | 89.99 |
Output: | category1 | category2 | customer_count | |-------------|-------------|----------------|
💡 Note: No users purchased from multiple categories, so no category pairs have cross-shopping customers. The result is empty since no pairs meet the minimum threshold of 3 customers.

Visualization

Tap to expand
User PurchasesProductPurchases+ ProductInfoCategory MappingUser → CategoriesJOIN OperationCross-ShoppingCategory PairsSelf-JOINFinal ResultsGROUP BY + COUNTFilter (≥3 customers)ORDER BY count, names1234🏬 Mall Analytics: Finding Store Type SynergiesIdentify which product categories customers frequently buy togetherOptimized with SQL JOINs and GROUP BY aggregation
Understanding the Visualization
1
Connect Data
JOIN purchases with product info to map users to categories they've shopped
2
Find Overlaps
Self-JOIN to identify users who purchased from multiple categories
3
Group & Count
GROUP BY category pairs and COUNT distinct cross-shopping customers
4
Filter & Rank
Keep pairs with 3+ customers, sort by popularity then alphabetically
Key Takeaway
🎯 Key Insight: Self-joining user-category data efficiently reveals cross-shopping patterns without nested loops, leveraging database indexing for optimal performance.

Time & Space Complexity

Time Complexity
⏱️
O(P + U×C²)

P for initial join, then U users times C² category combinations, but optimized with indexing

n
2n
Linear Growth
Space Complexity
O(U×C)

Space for user-category relationships and intermediate join results

n
2n
Linear Space

Constraints

  • 1 ≤ Number of purchases ≤ 105
  • 1 ≤ user_id, product_id ≤ 104
  • 1 ≤ quantity ≤ 100
  • 1 ≤ Number of products ≤ 104
  • 1 ≤ price ≤ 104.00
  • Category names contain only lowercase letters and spaces
  • Each (user_id, product_id) combination is unique in ProductPurchases
  • Each product_id is unique in ProductInfo
Asked in
Amazon 85 Google 62 Meta 45 Microsoft 38 Netflix 29
78.4K Views
High Frequency
~25 min Avg. Time
1.8K Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen