How to convert multiple columns in an R data frame into a single numerical column along with a column having column names as factor?


When we receive data from any source, it is highly likely that it would not be a perfect data set for the intended analysis, therefore, we need to perform some cleaning or mining based on the characteristics of the data. For example, if we have a column name of a data frame as factor levels of a numerical variable then we might want to convert that data frame in such a way that numerical values are stored in a single column and the column names are stored in another column that will represent a factor so that we can apply the analysis of variance techniques on this type of data. For this purpose, we can use the stack function as shown in the below examples.

Example

Consider the below data frame −

 Live Demo

x1<-rnorm(5)
x2<-rnorm(5)
x3<-rnorm(5)
x4<-rnorm(5)
df1<-data.frame(x1,x2,x3,x4)
df1

Output

      x1       x2          x3       x4
1 0.4231515 -0.02059351 -0.7323391 0.19199970
2 0.4816832 0.88382316 -0.5297544 0.17681651
3 -1.1703627 0.16328116 -0.7856500 0.03778934
4 -1.6009281 -0.93433554 1.5626258 -1.51384088
5 2.5075787 -0.94192579 -1.2340071 0.07821619

Stacking data frame df1 −

Example

df1<-stack(df1)
df1

Output

   values    ind
1 0.42315154  x1
2 0.48168320  x1
3 -1.17036266 x1
4 -1.60092810 x1
5 2.50757869  x1
6 -0.02059351 x2
7 0.88382316  x2
8 0.16328116  x2
9 -0.93433554 x2
10 -0.94192579 x2
11 -0.73233913 x3
12 -0.52975443 x3
13 -0.78564997 x3
14 1.56262579  x3
15 -1.23400706 x3
16 0.19199970  x4
17 0.17681651  x4
18 0.03778934  x4
19 -1.51384088 x4
20 0.07821619  x4

Example

 Live Demo

y1<-rpois(5,2)
y2<-rpois(5,2)
y3<-rpois(5,2)
y4<-rpois(5,2)
df2<-data.frame(y1,y2,y3,y4)
df2

Output

  y1 y2 y3 y4
1 2   6 3   2
2 0   2 4   1
3 1   1 7   3  
4 1   1 2   3
5 0   4 2   0

Stacking data frame df2 −

Example

df2<-stack(df2)
df2

Output

 values ind
1  2    y1
2  0    y1
3  1    y1
4  1    y1
5  0    y1
6  6    y2
7  2    y2
8  1    y2
9  1    y2
10 4    y2
11 3    y3
12 4    y3
13 7    y3
14 2    y3
15 2    y3
16 2    y4
17 1    y4
18 3    y4
19 3    y4
20 0    y4

Example

 Live Demo

z1<-rexp(5,1.02)
z2<-rexp(5,1.02)
z3<-rexp(5,1.02)
z4<-rexp(5,1.02)
df3<-data.frame(z1,z2,z3,z4)
df3

Output

      z1          z2       z3       z4
1 1.2908546 0.7256210 0.3485327 1.2388077
2 0.3096662 0.6603201 1.6009740 1.5944464
3 1.6638942 0.7771325 0.2083197 2.7376839
4 1.5370138 0.1080698 0.7180111 1.3909656
5 0.3302388 1.2617053 0.3907855 0.1516651

Stacking data frame df3 −

Example

df3<-stack(df3)
df3

Output

   values ind
1  1.2908546 z1
2  0.3096662 z1
3  1.6638942 z1
4  1.5370138 z1
5  0.3302388 z1
6  0.7256210 z2
7  0.6603201 z2
8  0.7771325 z2
9  0.1080698 z2
10 1.2617053 z2
11 0.3485327 z3
12 1.6009740 z3
13 0.2083197 z3
14 0.7180111 z3
15 0.3907855 z3
16 1.2388077 z4
17 1.5944464 z4
18 2.7376839 z4
19 1.3909656 z4
20 0.1516651 z4

Updated on: 07-Dec-2020

716 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements