Top Percentile Fraud - Problem

The Leetcode Insurance Corp has developed an ML-driven predictive model to detect the likelihood of fraudulent claims. They allocate their most seasoned claim adjusters to address the top 5% of claims flagged by this model.

Write a solution to find the top 5 percentile of claims from each state. Return the result table ordered by state in ascending order, fraud_score in descending order, and policy_id in ascending order.

The Fraud table contains:

  • policy_id - unique policy identifier
  • state - state where the policy is issued
  • fraud_score - ML model fraud likelihood score

Table Schema

Fraud
Column Name Type Description
policy_id PK int Unique policy identifier
state varchar State where policy is issued
fraud_score int ML model fraud likelihood score
Primary Key: policy_id
Note: Each row represents a policy with its fraud risk assessment

Input & Output

Example 1 — Multiple States with Top 5%
Input Table:
policy_id state fraud_score
1 CA 95
2 CA 88
3 CA 75
4 NY 92
5 NY 89
6 NY 82
7 TX 98
Output:
policy_id state fraud_score
1 CA 95
4 NY 92
7 TX 98
💡 Note:

From CA (3 policies): top 5% includes policy 1 with highest score 95. From NY (3 policies): top 5% includes policy 4 with highest score 92. From TX (1 policy): policy 7 is automatically in top 5%. Results ordered by state ASC, fraud_score DESC, policy_id ASC.

Example 2 — Tied Fraud Scores
Input Table:
policy_id state fraud_score
1 FL 90
2 FL 90
3 FL 85
4 FL 80
Output:
policy_id state fraud_score
1 FL 90
2 FL 90
💡 Note:

Both policies 1 and 2 have the same fraud_score of 90, so they both get percentile rank 0.00 (tied for first place). Since 2 out of 4 policies = 50%, but they're tied at the top, both are included in the top 5% group.

Constraints

  • 1 ≤ policy_id ≤ 100000
  • 1 ≤ fraud_score ≤ 100
  • state consists of valid US state abbreviations
  • Each state has at least 1 policy

Visualization

Tap to expand
Top Percentile Fraud - SQL Solution INPUT policy_id state fraud_score 101 CA 95 102 CA 88 103 CA 72 201 NY 91 202 NY 85 301 TX 97 ... more rows per state Fraud Table Multiple states with fraud scores 0-100 Find top 5% per state Order: state ASC, score DESC ALGORITHM STEPS 1 PERCENT_RANK() Compute percentile per state PERCENT_RANK() OVER ( PARTITION BY state ORDER BY fraud_score DESC) 2 PARTITION BY state Rank within each state group CA 0.00-1.00 NY 0.00-1.00 TX 0.00-1.00 3 Filter Top 5% WHERE pct_rank <= 0.05 percentile_rank <= 0.05 4 ORDER BY state, fraud_score DESC, id FINAL RESULT policy_id state fraud_score 101 CA 95 201 NY 91 301 TX 97 OK Top 5% Fraud Claims Highest fraud scores from each state Properly ordered SELECT policy_id, state, fraud_score FROM (subquery with rank) WHERE pct <= 0.05 ORDER BY state, score DESC Key Insight: PERCENT_RANK() returns values from 0.0 (first row) to 1.0 (last row) within each partition. By ordering DESC and filtering <= 0.05, we get the top 5% highest fraud scores per state. Time Complexity: O(n log n) for window function sorting | Space Complexity: O(n) for ranking storage TutorialsPoint - Top Percentile Fraud | Optimal Solution
Asked in
Amazon 15 Microsoft 12 Google 8
28.4K Views
Medium Frequency
~18 min Avg. Time
890 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen