How to find percentile rank for groups in an R data frame?

R ProgrammingServer Side ProgrammingProgramming

The word percentile means the percentage that falls below or above the percentile value. For example, if we have a value that lies at 50th percentile then we would say 50 percent of the values lies below or above that value. The value 50 here is called the percentile rank. To find the percentile rank for groups in an R data frame, we can use mutate function of dplyr package.

Example

Consider the below data frame −

 Live Demo

Group<-sample(LETTERS[1:4],20,replace=TRUE)
Response<-rpois(20,5)
df1<-data.frame(Group,Response)
df1

Output

Group Response
1 D    5
2 B    7
3 D    5
4 C    4
5 D    5
6 C    5
7 A    10
8 D    3
9 B    2
10 D   0
11 B   4
12 D   5
13 A   3
14 A   6
15 D   2
16 A   7
17 A   6
18 C   2
19 A   9
20 C   3

Example

Loading dplyr package −

library(dplyr)

Finding percentile rank of response for Groups −

Example

df1%>%group_by(Group)%>%mutate(Percentile_Rank=rank(Response)/length(Response))
# A tibble: 20 x 3
# Groups: Group [4]

Output

Group Response Percentile_Rank
<chr> <int> <dbl>
1 D    5    0.786
2 B    7    1
3 D    5    0.786
4 C    4    0.75
5 D    5    0.786
6 C    5    1
7 A    10    1
8 D    3    0.429
9 B    2    0.333
10 D   0    0.143
11 B   4    0.667
12 D   5    0.786
13 A   3    0.167
14 A   6    0.417
15 D   2    0.286
16 A   7    0.667
17 A   6    0.417
18 C   2    0.25
19 A   9    0.833
20 C   3    0.5

Example

 Live Demo

Class<-sample(c("I","II","III"),20,replace=TRUE)
Y<-rnorm(20,25,3.27)
df2<-data.frame(Class,Y)
df2

Output

   Class Y
1 III 32.88152
2 III 23.35048
3 III 19.78199
4 III 26.05137
5 I 26.16563
6 III 20.30466
7 I 22.93382
8 II 30.03620
9 I 16.89365
10 I 27.33329
11 I 27.46550
12 III 27.59028
13 II 27.40766
14 III 23.29442
15 II 28.69237
16 II 31.25723
17 II 22.58002
18 III 22.48583
19 I 26.08357
20 III 24.51681

Finding percentile rank of response for Class −

Example

df2%>%group_by(Class)%>%mutate(Percentile_Rank=rank(Y)/length(Y))
# A tibble: 20 x 3
# Groups: Class [3]

Output

Class Y Percentile_Rank
<chr> <dbl> <dbl>
1 III 32.9 1
2 III 23.4 0.556
3 III 19.8 0.111
4 III 26.1 0.778
5 I   26.2 0.667
6 III 20.3 0.222
7 I   22.9 0.333
8 II  30.0 0.8
9 I   16.9 0.167
10 I  27.3 0.833
11 I  27.5 1
12 III 27.6 0.889
13 II  27.4 0.4
14 III 23.3 0.444
15 II  28.7 0.6
16 II  31.3 1
17 II 22.6 0.2
18 III 22.5 0.333
19 I   26.1 0.5
20 III 24.5 0.667
raja
Published on 08-Dec-2020 07:17:56
Advertisements