Top Percentile Fraud - Problem
LeetCode Insurance Corp has developed a cutting-edge machine learning model to detect fraudulent insurance claims. The company wants to deploy their most experienced claim adjusters to handle the riskiest cases - specifically the top 5% of claims in each state based on fraud scores.
Given a table Fraud containing policy information:
| Column Name | Type |
|---|---|
| policy_id | int |
| state | varchar |
| fraud_score | int |
Your task is to identify the top 5 percentile of claims from each state and return them ordered by:
statein ascending orderfraud_scorein descending orderpolicy_idin ascending order
The higher the fraud score, the more likely the claim is fraudulent. Help the insurance company allocate their resources efficiently!
Input & Output
example_1.sql โ Basic Case
$
Input:
Fraud table:
| policy_id | state | fraud_score |
|-----------|-------|-------------|
| 1 | CA | 75 |
| 2 | CA | 85 |
| 3 | CA | 90 |
| 4 | CA | 95 |
| 5 | NY | 70 |
| 6 | NY | 80 |
| 7 | NY | 88 |
| 8 | NY | 92 |
โบ
Output:
| policy_id | state | fraud_score |
|-----------|-------|-------------|
| 4 | CA | 95 |
| 8 | NY | 92 |
๐ก Note:
For CA with 4 records, top 5% means the highest scoring record (95). For NY with 4 records, top 5% is also the highest record (92). Results are ordered by state ASC, fraud_score DESC.
example_2.sql โ Multiple Top Records
$
Input:
Fraud table:
| policy_id | state | fraud_score |
|-----------|-------|-------------|
| 1 | TX | 60 |
| 2 | TX | 65 |
| 3 | TX | 70 |
| 4 | TX | 75 |
| 5 | TX | 80 |
| 6 | TX | 85 |
| 7 | TX | 90 |
| 8 | TX | 95 |
| 9 | TX | 97 |
| 10 | TX | 99 |
โบ
Output:
| policy_id | state | fraud_score |
|-----------|-------|-------------|
| 10 | TX | 99 |
๐ก Note:
TX has 10 records, so top 5% includes 1 record (10 * 0.05 = 0.5, rounded up to 1). The highest fraud score is 99 with policy_id 10.
example_3.sql โ Tie-breaking by Policy ID
$
Input:
Fraud table:
| policy_id | state | fraud_score |
|-----------|-------|-------------|
| 101 | FL | 88 |
| 102 | FL | 92 |
| 103 | FL | 95 |
| 104 | FL | 95 |
| 105 | WA | 85 |
| 106 | WA | 90 |
| 107 | WA | 93 |
| 108 | WA | 93 |
โบ
Output:
| policy_id | state | fraud_score |
|-----------|-------|-------------|
| 103 | FL | 95 |
| 104 | FL | 95 |
| 107 | WA | 93 |
| 108 | WA | 93 |
๐ก Note:
FL has 4 records, top 5% includes the top record(s) with score 95. Both policies 103 and 104 have the same score, so both are included and ordered by policy_id ASC. Same logic applies to WA state.
Constraints
- 1 โค Number of records โค 105
- 1 โค policy_id โค 106
- 1 โค fraud_score โค 100
- Each policy_id is unique
- State names are valid US state abbreviations
Visualization
Tap to expand
Understanding the Visualization
1
Group by State
Partition records by state, like separating students by grade level
2
Calculate Rankings
Use PERCENT_RANK() to determine where each fraud score falls within its state
3
Filter Top 5%
Select only records with percentile_rank >= 0.95 (top 5 percentile)
4
Apply Final Ordering
Sort by state ASC, fraud_score DESC, policy_id ASC for consistent results
Key Takeaway
๐ฏ Key Insight: Window functions with PARTITION BY allow us to calculate percentiles within groups efficiently, making this perfect for state-wise ranking problems. The PERCENT_RANK() function handles the complex math automatically!
๐ก
Explanation
AI Ready
๐ก Suggestion
Tab
to accept
Esc
to dismiss
// Output will appear here after running code