Find the percentiles for multiple columns in an R data frame.


To find the percentiles for multiple columns in R data frame, we can use apply function with quantile function and providing the quantile probabilities with probs argument.

For Example, if we have a data frame called df that contains multiple columns and we want to find three percentiles 0.25, 0.70, 0.90 then we can use the command given below −

apply(df[],2,quantile,probs=c(0.25,0.70,0.90))

Example 1

Following snippet creates a sample data frame −

x1<-rnorm(20)
x2<-rnorm(20)
x3<-rnorm(20)
df1<-data.frame(x1,x2,x3)
df1

The following dataframe is created

           x1          x2          x3
 1 -1.1681428 -0.28065525 -0.53819110
 2  0.2318993 -1.15544267  0.17944881
 3 -0.5333789  0.36613560 -1.48668050
 4 -1.3099335  0.43108366  1.08802308
 5  0.6470196  0.08830738 -0.25840686
 6 -0.1701303  1.87160281 -0.48819826
 7  0.2818403  0.13090818 -0.96722760
 8 -3.0132800 -2.09431074  0.31341228
 9 -0.4261333 -1.16471217  0.93643827
10 -1.0134820  0.60068445 -1.57522191
11  0.7188261 -0.09290046 -1.21396318
12  0.0877293 -1.10543055 -1.03759785
13 -0.8056363  2.37757742  0.27509481
14 -1.1005749 -0.64515153 -0.86212935
15  2.1567133  0.92086077  0.70579629
16  0.3628198  0.01760350  0.51998078
17 -0.7449807 -0.88991305 -0.91787379
18  1.6731441  0.02442096 -0.03178033
19  1.1367622  1.00582342 -0.25280294
20  2.7935713 -0.19143469 -0.14149516

To find different percentiles for all columns in df1 on the above created data frame, add the following code to the above snippet −

x1<-rnorm(20)
x2<-rnorm(20)
x3<-rnorm(20)
df1<-data.frame(x1,x2,x3)
apply(df1[],2,quantile,probs=c(0.05,0.10,0.20,0.25,0.30,0.40,0.50,0.60,0.70,0.75,0.80,
0.90,0.95))

Output

If you execute all the above given snippets as a single program, it generates the following Output −

            x1           x2         x3
 5% -1.39510080 -1.21119210 -1.49110757
10% -1.18232188 -1.15636962 -1.24123491
20% -1.03090060 -0.93301655 -0.98130165
25% -0.85759773 -0.70634191 -0.93021224
30% -0.76317738 -0.39000413 -0.87885268
40% -0.46903156 -0.13231415 -0.50819540
50% -0.04120049  0.02101223 -0.25560490
60%  0.25187572  0.10534770 -0.09760923
70%  0.44807978  0.38562001  0.20814261
75%  0.66497124  0.47348386  0.28467418
80%  0.80241330  0.66471971  0.35472598
90%  1.72150104  1.09240136  0.72886048
95% 2.18855624 1.89690154 0.94401751

Example 2

Following snippet creates a sample data frame −

y1<-rpois(20,1)
y2<-rpois(20,2)
y3<-rpois(20,5)
y4<-rpois(20,5)
df2<-data.frame(y1,y2,y3,y4)
df2

The following dataframe is created

  y1 y2 y3 y4
 1 1  2  4  6
 2 1  6  7  5
 3 2  2  6  4
 4 0  4  2  5
 5 1  1  5  6
 6 1  2  2  7
 7 3  2  4  6
 8 2  3  6  5
 9 1  2  7  3
10 0  3  6  4
11 1  0  8  5
12 1  1  3  4
13 4  0  3  3
14 0  1  3  8
15 2  4  5  1
16 2  2  2  1
17 0  1  5  6
18 2  3  2  3
19 0  2  2 11
20 0  2  5  3

To find different percentiles for all columns in df2 on the above created data frame, add the following code to the above snippet −

y1<-rpois(20,1)
y2<-rpois(20,2)
y3<-rpois(20,5)
y4<-rpois(20,5)
df2<-data.frame(y1,y2,y3,y4)
apply(df2[],2,quantile,probs=c(0.05,0.10,0.20,0.25,0.30,0.40,0.50,0.60,0.70,0.75,0.80,
0.90,0.95))

Output

If you execute all the above given snippets as a single program, it generates the following Output −

      y1  y2   y3   y4
 5% 0.00 0.0 2.00 1.00
10% 0.00 0.9 2.00 2.80
20% 0.00 1.0 2.00 3.00
25% 0.00 1.0 2.75 3.00
30% 0.70 1.7 3.00 3.70
40% 1.00 2.0 3.60 4.00
50% 1.00 2.0 4.50 5.00
60% 1.00 2.0 5.00 5.00
70% 2.00 2.3 5.30 6.00
75% 2.00 3.0 6.00 6.00
80% 2.00 3.0 6.00 6.00
90% 2.10 4.0 7.00 7.10
95% 3.05 4.1 7.05 8.15

Updated on: 09-Nov-2021

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements