Fix Product Name Format - Problem
๐๏ธ Sales Data Cleanup Challenge
You're working as a data analyst for an e-commerce company, and you've discovered that the sales database from the year 2000 is a complete mess! ๐ฑ
The Sales table contains product sales data, but since it was filled manually, the product names are inconsistent:
- Some have extra spaces at the beginning or end
- Some are in UPPERCASE, others in lowercase, and some are MiXeD cAsE
- You need to clean this data and generate a monthly sales report
Your mission: Transform messy product names into a clean, standardized format and count how many times each product was sold per month.
| Column Name | Type |
|---|---|
| sale_id | int |
| product_name | varchar |
| sale_date | date |
Output Requirements:
- Product names should be
lowercaseandtrimmedof whitespace - Dates should be formatted as
'YYYY-MM' - Include the total count of sales for each product per month
- Results ordered by
product_nameASC, thensale_dateASC
Input & Output
example_1.sql โ Basic Product Name Cleanup
$
Input:
Sales table:
+----------+----------------+------------+
| sale_id | product_name | sale_date |
+----------+----------------+------------+
| 1 | ' iPhone ' | 2000-01-01 |
| 2 | 'IPHONE' | 2000-01-15 |
| 3 | ' Samsung ' | 2000-02-01 |
| 4 | 'iphone' | 2000-01-20 |
| 5 | 'SAMSUNG' | 2000-02-10 |
+----------+----------------+------------+
โบ
Output:
+----------------+------------+-------+
| product_name | sale_date | total |
+----------------+------------+-------+
| 'iphone' | '2000-01' | 3 |
| 'samsung' | '2000-02' | 2 |
+----------------+------------+-------+
๐ก Note:
Product names are cleaned (lowercased and trimmed), dates are formatted to YYYY-MM, and sales are counted per product per month. 'iPhone' variants (3 sales) are grouped together, and 'Samsung' variants (2 sales) are grouped together.
example_2.sql โ Multiple Months Same Product
$
Input:
Sales table:
+----------+----------------+------------+
| sale_id | product_name | sale_date |
+----------+----------------+------------+
| 1 | ' MacBook ' | 2000-01-05 |
| 2 | 'MACBOOK' | 2000-01-15 |
| 3 | ' macbook' | 2000-02-01 |
| 4 | 'MacBook Pro' | 2000-01-20 |
+----------+----------------+------------+
โบ
Output:
+----------------+------------+-------+
| product_name | sale_date | total |
+----------------+------------+-------+
| 'macbook' | '2000-01' | 2 |
| 'macbook' | '2000-02' | 1 |
| 'macbook pro' | '2000-01' | 1 |
+----------------+------------+-------+
๐ก Note:
Same product 'macbook' appears in different months, so it gets separate entries. 'MacBook Pro' is treated as a different product. Results are ordered by product name first, then by date.
example_3.sql โ Edge Case with Extreme Whitespace
$
Input:
Sales table:
+----------+-------------------------+------------+
| sale_id | product_name | sale_date |
+----------+-------------------------+------------+
| 1 | '\t\t XBOX \n ' | 2000-03-01 |
| 2 | ' Xbox ' | 2000-03-15 |
| 3 | ' ' | 2000-03-20 |
| 4 | 'PlayStation' | 2000-03-01 |
+----------+-------------------------+------------+
โบ
Output:
+----------------+------------+-------+
| product_name | sale_date | total |
+----------------+------------+-------+
| '' | '2000-03' | 1 |
| 'playstation' | '2000-03' | 1 |
| 'xbox' | '2000-03' | 2 |
+----------------+------------+-------+
๐ก Note:
Extreme whitespace (tabs, newlines, multiple spaces) is handled by TRIM function. Empty string after trimming is treated as a valid product name. Xbox variants are properly grouped together.
Visualization
Tap to expand
Understanding the Visualization
1
Input Raw Data
Messy sales records with inconsistent product names and full dates
2
Clean Product Names
Apply LOWER() and TRIM() to standardize all product names
3
Extract Month
Format dates to YYYY-MM for monthly grouping
4
Group & Count
Use GROUP BY to aggregate sales counts per product per month
5
Sort Results
ORDER BY product name, then date for consistent output
Key Takeaway
๐ฏ Key Insight: SQL's GROUP BY with aggregate functions transforms messy data into clean reports in a single optimized query, leveraging database engine optimizations for maximum performance.
Time & Space Complexity
Time Complexity
O(n log n)
Database engine optimizes grouping and sorting, typically O(n log n) for sorting
โก Linearithmic
Space Complexity
O(k)
Where k is the number of unique product-month combinations (much smaller than n)
โ Linear Space
Constraints
- 1 โค number of sales records โค 105
- 1 โค product_name.length โค 100
- product_name may contain leading/trailing whitespace and mixed case
- sale_date is in valid YYYY-MM-DD format
- All sale_date values are from the year 2000
๐ก
Explanation
AI Ready
๐ก Suggestion
Tab
to accept
Esc
to dismiss
// Output will appear here after running code