Top Percentile Fraud - Problem

LeetCode Insurance Corp has developed a cutting-edge machine learning model to detect fraudulent insurance claims. The company wants to deploy their most experienced claim adjusters to handle the riskiest cases - specifically the top 5% of claims in each state based on fraud scores.

Given a table Fraud containing policy information:

Column NameType
policy_idint
statevarchar
fraud_scoreint

Your task is to identify the top 5 percentile of claims from each state and return them ordered by:

  1. state in ascending order
  2. fraud_score in descending order
  3. policy_id in ascending order

The higher the fraud score, the more likely the claim is fraudulent. Help the insurance company allocate their resources efficiently!

Input & Output

example_1.sql โ€” Basic Case
$ Input: Fraud table: | policy_id | state | fraud_score | |-----------|-------|-------------| | 1 | CA | 75 | | 2 | CA | 85 | | 3 | CA | 90 | | 4 | CA | 95 | | 5 | NY | 70 | | 6 | NY | 80 | | 7 | NY | 88 | | 8 | NY | 92 |
โ€บ Output: | policy_id | state | fraud_score | |-----------|-------|-------------| | 4 | CA | 95 | | 8 | NY | 92 |
๐Ÿ’ก Note: For CA with 4 records, top 5% means the highest scoring record (95). For NY with 4 records, top 5% is also the highest record (92). Results are ordered by state ASC, fraud_score DESC.
example_2.sql โ€” Multiple Top Records
$ Input: Fraud table: | policy_id | state | fraud_score | |-----------|-------|-------------| | 1 | TX | 60 | | 2 | TX | 65 | | 3 | TX | 70 | | 4 | TX | 75 | | 5 | TX | 80 | | 6 | TX | 85 | | 7 | TX | 90 | | 8 | TX | 95 | | 9 | TX | 97 | | 10 | TX | 99 |
โ€บ Output: | policy_id | state | fraud_score | |-----------|-------|-------------| | 10 | TX | 99 |
๐Ÿ’ก Note: TX has 10 records, so top 5% includes 1 record (10 * 0.05 = 0.5, rounded up to 1). The highest fraud score is 99 with policy_id 10.
example_3.sql โ€” Tie-breaking by Policy ID
$ Input: Fraud table: | policy_id | state | fraud_score | |-----------|-------|-------------| | 101 | FL | 88 | | 102 | FL | 92 | | 103 | FL | 95 | | 104 | FL | 95 | | 105 | WA | 85 | | 106 | WA | 90 | | 107 | WA | 93 | | 108 | WA | 93 |
โ€บ Output: | policy_id | state | fraud_score | |-----------|-------|-------------| | 103 | FL | 95 | | 104 | FL | 95 | | 107 | WA | 93 | | 108 | WA | 93 |
๐Ÿ’ก Note: FL has 4 records, top 5% includes the top record(s) with score 95. Both policies 103 and 104 have the same score, so both are included and ordered by policy_id ASC. Same logic applies to WA state.

Constraints

  • 1 โ‰ค Number of records โ‰ค 105
  • 1 โ‰ค policy_id โ‰ค 106
  • 1 โ‰ค fraud_score โ‰ค 100
  • Each policy_id is unique
  • State names are valid US state abbreviations

Visualization

Tap to expand
Insurance Fraud Detection: Top 5% SelectionCalifornia Claims75828595TOP 5% โญNew York Claims70808898TOP 5% โญML Model Ranking Process1. PARTITION BY state2. ORDER BY fraud_score ASC3. PERCENT_RANK() calculation4. Filter >= 0.95 percentile๐Ÿ” Assign Senior Adjustersto highest risk claimsFinal Results: Ordered OutputCA: Score 95 (ID: 4)NY: Score 98 (ID: 8)๐ŸŽฏ Efficient Resource Allocation: Deploy experienced adjusters to highest-risk cases!
Understanding the Visualization
1
Group by State
Partition records by state, like separating students by grade level
2
Calculate Rankings
Use PERCENT_RANK() to determine where each fraud score falls within its state
3
Filter Top 5%
Select only records with percentile_rank >= 0.95 (top 5 percentile)
4
Apply Final Ordering
Sort by state ASC, fraud_score DESC, policy_id ASC for consistent results
Key Takeaway
๐ŸŽฏ Key Insight: Window functions with PARTITION BY allow us to calculate percentiles within groups efficiently, making this perfect for state-wise ranking problems. The PERCENT_RANK() function handles the complex math automatically!
Asked in
Google 25 Amazon 18 Microsoft 15 Meta 12
23.4K Views
Medium Frequency
~25 min Avg. Time
847 Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen