Create a quartile column for each value in an R data frame column.

R ProgrammingServer Side ProgrammingProgramming

Any numerical data can be divided into four parts by using three quartiles, first quartile at 25%, second quartile at 50% and third quartile at 75% hence there will be four quarters to represent first 25%, second 25%, third 25% and the last 25% in a set of data.

If we want to create a quartile (1 to 4) column for each value in an R data frame column then we can use the quantile function and cut function as shown in the below Examples.

Example 1

Following snippet creates a sample data frame −

x<-sample(1:50,20)
df1<-data.frame(x)
df1

The following dataframe is created

    x
1   4
2  44
3   1
4  14
5   5
6  18
7   3
8  16
9  41
10 10
11 35
12 48
13 36
14 23
15 17
16 19
17 11
18 43
19 28
20 29

To create a quartile column for column x in df1 on the above created data frame, add the following code to the above snippet −

x<-sample(1:50,20)
df1<-data.frame(x)
df1$Quartile<-cut(df1$x,quantile(df1$x),include.lowest=TRUE,labels=FALSE)
df1

Output

If you execute all the above given snippets as a single program, it generates the following Output −

    x Quartile
1   4   1
2  44   4
3   1   1
4  14   2
5   5   1
6  18   2
7   3   1
8  16   2
9  41   4
10 10   1
11 35   3
12 48   4
13 36   4
14 23   3
15 17   2
16 19   3
17 11   2
18 43   4
19 28   3
20 29   3

Example 2

Following snippet creates a sample data frame −

y<-rnorm(20)
df2<-data.frame(y)
df2

The following dataframe is created

           y
1  -0.08949509
2  -0.12711363
3  -0.52805367
4   0.08087408
5  -1.35069115
6   0.13678392
7  -0.41386292
8  -0.80830050
9  -1.13387570
10 -1.56282579
11 -1.27191819
12  1.10834061
13 -1.53450425
14  0.83568645
15 -0.52896185
16  0.45211521
17 -1.45162982
18 -0.63935428
19  1.71258558
20  1.09091493

To create a quartile column for column y in df2 on the above created data frame, add the following code to the above snippet −

y<-rnorm(20)
df2<-data.frame(y)
df2$Quartile<-cut(df2$y,quantile(df2$y),include.lowest=TRUE,labels=FALSE)
df2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

            y Quartile
1  -0.08949509    3
2  -0.12711363    3
3  -0.52805367    2
4   0.08087408    3
5  -1.35069115    1
6   0.13678392    3
7  -0.41386292    3
8  -0.80830050    2
9  -1.13387570    2
10 -1.56282579    1
11 -1.27191819    1
12  1.10834061    4
13 -1.53450425    1
14  0.83568645    4
15 -0.52896185    2
16  0.45211521    4
17 -1.45162982    1
18 -0.63935428    2
19  1.71258558    4
20  1.09091493    4
raja
Published on 05-Nov-2021 07:51:41
Advertisements