Top Percentile Fraud - Problem

Database Medium

The Leetcode Insurance Corp has developed an ML-driven predictive model to detect the likelihood of fraudulent claims. They allocate their most seasoned claim adjusters to address the top 5% of claims flagged by this model.

Write a solution to find the top 5 percentile of claims from each state. Return the result table ordered by state in ascending order, fraud_score in descending order, and policy_id in ascending order.

The Fraud table contains:

policy_id - unique policy identifier
state - state where the policy is issued
fraud_score - ML model fraud likelihood score

Table Schema

Fraud

Column Name	Type	Description
`policy_id` PK	int	Unique policy identifier
`state`	varchar	State where policy is issued
`fraud_score`	int	ML model fraud likelihood score

Primary Key: policy_id

Note: Each row represents a policy with its fraud risk assessment

Input & Output

Example 1 — Multiple States with Top 5%

Input Table:

policy_id	state	fraud_score
1	CA	95
2	CA	88
3	CA	75
4	NY	92
5	NY	89
6	NY	82
7	TX	98

Output:

policy_id	state	fraud_score
1	CA	95
4	NY	92
7	TX	98

💡 Note:

From CA (3 policies): top 5% includes policy 1 with highest score 95. From NY (3 policies): top 5% includes policy 4 with highest score 92. From TX (1 policy): policy 7 is automatically in top 5%. Results ordered by state ASC, fraud_score DESC, policy_id ASC.

Example 2 — Tied Fraud Scores

Input Table:

policy_id	state	fraud_score
1	FL	90
2	FL	90
3	FL	85
4	FL	80

Output:

policy_id	state	fraud_score
1	FL	90
2	FL	90

💡 Note:

Both policies 1 and 2 have the same fraud_score of 90, so they both get percentile rank 0.00 (tied for first place). Since 2 out of 4 policies = 50%, but they're tied at the top, both are included in the top 5% group.

Constraints

1 ≤ policy_id ≤ 100000
1 ≤ fraud_score ≤ 100
state consists of valid US state abbreviations
Each state has at least 1 policy

Visualization

Tap to expand

Asked in

a Amazon 15 M Microsoft 12 G Google 8

Use PERCENT_RANK() window function with PARTITION BY state ORDER BY fraud_score DESC to calculate percentiles within each state, then filter for percentile_rank ≤ 0.05.

Table Schema

Fraud

Column Name	Type	Description
`policy_id` PK	int	Unique policy identifier
`state`	varchar	State where policy is issued
`fraud_score`	int	ML model fraud likelihood score

Primary Key: policy_id

Note: Each row represents a policy with its fraud risk assessment

Common Approaches

✓ Window Function with PERCENT_RANK

⏱️ Time: O(n log n) Space: O(n)

PERCENT_RANK() calculates the relative rank of each row within its partition as a percentage. Values range from 0 to 1, where higher fraud scores get lower percentile values (closer to 0).

Window Function with PERCENT_RANK — Algorithm Steps

Step 1: Partition data by state and rank by fraud_score descending
Step 2: Filter for records with PERCENT_RANK <= 0.05 (top 5%)
Step 3: Order results by state, fraud_score desc, policy_id asc

Visualization

Tap to expand

Step-by-Step Walkthrough

Partition

Group by state

Rank

Order by fraud_score DESC

Filter

Keep top 5% (≤ 0.05)

Code -

solution.c — C

Time & Space Complexity

Time Complexity

⏱️

O(n log n)

Sorting required for ranking within each state partition

⚡ Linearithmic

Space Complexity

O(n)

Window function temporary storage

⚡ Linearithmic Space

28.4K Views

Medium Frequency

~18 min Avg. Time

890 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Top Percentile Fraud - Problem

Table Schema

Input & Output

Constraints

Visualization

Related Problems

Table Schema

Common Approaches

Window Function with PERCENT_RANK — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler