How to split a data frame by column in R?


If we have a data frame column that contains some duplicate values or represent categories then we might want to split the data frame based on that column.

For example, if we have a data frame called df that contains a column say Col then we can split the data frame by Col by using the command given below −

split(df,df$Col)

Example 1

Following snippet creates a sample data frame −

Group<-sample(c("Male","Female"),20,replace=TRUE)
Score<-rpois(20,8)
df1<-data.frame(Group,Score)
df1

The following dataframe is created −

  Group  Score
1  Male    8
2  Female  6
3  Female  5
4  Female  7
5  Female 12
6  Female 10
7  Female  9
8  Male    9
9  Male   10
10 Female 13
11 Female  4
12 Male   11
13 Female  5
14 Male    6
15 Female  9
16 Male    9
17 Female  5
18 Female  8
19 Male    3
20 Female  5

To split df1 based on Group column, add the following code to the above snippet −

Group<-sample(c("Male","Female"),20,replace=TRUE)
Score<-rpois(20,8)
df1<-data.frame(Group,Score)
split(df1,df1$Group)

Output

If you execute all the above given snippets as a single program, it generates the following output −

$Female
    Group Score
2  Female  6
3  Female  5
4  Female  7
5  Female 12
6  Female 10
7  Female  9
10 Female 13
11 Female  4
13 Female  5
15 Female  9
17 Female  5
18 Female  8
20 Female  5

$Male
  Group Score
1  Male  8
8  Male  9
9  Male 10
12 Male 11
14 Male  6
16 Male  9
19 Male  3

Example 2

Following snippet creates a sample data frame −

Class<-sample(c("I","II","III"),20,replace=TRUE)
Number_of_Customers<-sample(1:10,20,replace=TRUE)
df2<-data.frame(Class,Number_of_Customers)
df2

The following dataframe is created −

 Class Number_of_Customers
1   II   2
2    I  10
3  III   2
4  III   4
5  III   5
6  III   7
7  III  10
8   II   4
9   II   9
10   I   7
11 III   4
12   I   1
13   I   1
14   I   1
15 III   5
16  II   5
17  II   9
18  II   8
19  II   8
20   I   9

To split df2 based on Class column, add the following code to the above snippet −

Class<-sample(c("I","II","III"),20,replace=TRUE)
Number_of_Customers<-sample(1:10,20,replace=TRUE)
df2<-data.frame(Class,Number_of_Customers)
split(df2,df2$Class)

Output

If you execute all the above given snippets as a single program, it generates the following output −

$I
  Class Number_of_Customers
2   I   10
10  I    7
12  I    1
13  I    1
14  I    1
20  I    9

$II
 Class Number_of_Customers
1  II   2
8  II   4
9  II   9
16 II   5
17 II   9
18 II   8
19 II   8

$III
 Class Number_of_Customers
3  III   2
4  III   4
5  III   5
6  III   7
7  III  10
11 III   4
15 III   5

Updated on: 09-Nov-2021

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements