How to remove underscore from column names of an R data frame?


When we import data from outside sources then the header or column names might be imported with underscore separated values and this is also possible if the original data has the same format. Therefore, to make the headers shorter and look better we would prefer to remove the underscore sign and this can be easily done with the help of gsub function.

Consider the below data frame −

Example

 Live Demo

x_1<-sample(1:10,20,replace=TRUE)
x_2<-sample(1:10,20,replace=TRUE)
x_3<-sample(1:10,20,replace=TRUE)
x_4<-sample(1:10,20,replace=TRUE)
x_5<-sample(1:10,20,replace=TRUE)
df1<-data.frame(x_1,x_2,x_3,x_4,x_5)
df1

Output

x_1 x_2 x_3 x_4 x_5
1 10 4 6 5 10
2 6 10 2 1 4
3 9 9 6 1 4
4 6 1 5 5 8
5 7 7 4 7 4
6 1 5 2 1 8
7 8 5 5 2 9
8 8 4 1 9 8
9 8 1 7 4 3
10 5 9 3 10 3
11 2 7 5 6 9
12 10 1 4 1 5
13 8 10 10 1 2
14 3 10 5 7 6
15 5 6 9 1 10
16 3 8 6 4 7
17 8 9 5 7 2
18 6 10 5 6 8
19 1 8 3 2 9
20 8 1 5 10 5

Removing underscore from column names −

Example

names(df1)<-gsub("\_","",names(df1))
df1

Output

  x1 x2 x3 x4 x5
1 6 8 2 9 6
2 1 9 3 4 10
3 2 1 8 10 10
4 4 10 3 6 1
5 10 6 6 6 5
6 9 4 6 6 2
7 3 9 10 5 9
8 8 1 5 3 8
9 4 9 2 5 6
10 9 3 3 5 4
11 7 1 4 6 3
12 10 6 3 3 1
13 7 6 10 10 8
14 9 6 4 1 1
15 7 5 10 2 1
16 1 3 7 4 8
17 2 1 7 2 8
18 1 10 8 2 3
19 8 7 6 6 10
20 3 8 9 8 3

Let’s have a look at another example −

Example

 Live Demo

y_1<-rnorm(20)
y_2<-rnorm(20,2,1)
y_3<-rnorm(20,2,0.5)
y_4<-rnorm(20,2,0.0003)
y_5<-rnorm(20,10,1)
df2<-data.frame(y_1,y_2,y_3,y_4,y_5)
df2

Output

        y_1       y_2      y_3       y_4      y_5
1 0.514450792  2.4374182  3.230083 1.999826 12.625661
2 -0.312792686  0.8350701  2.769788 1.999740 8.699441
3 -0.710758168  2.7832089  1.971917 2.000519 8.430542
4 -0.060647019  1.4626953 1.971298 2.000600 9.568890
5 2.363567996  0.8239008  2.626454 2.000266 10.038633
6 1.227010669  2.6716199  1.844929 1.999768 7.838243
7 -0.994717233  1.1798125  2.084188 1.999643 11.254072
8 2.584374114  1.6053897  2.453163 2.000089 11.256447
9 0.863363636  1.0685646  1.457286 2.000659 11.001834
10 -0.190736476  1.4468239  1.829696 2.000229 10.425032
11 0.716178594  2.7498080  2.406190 1.999487 9.906237
12 -1.670744103  1.1184815  2.206973 2.000288 8.993506
13 1.011970392  2.7794836  2.560877 2.000160 12.564313
14 -0.099591556  1.5176429  1.841669 2.000175 12.050816
15 3.230713917  1.8450534  2.065576 2.000189 9.243683
16 0.734370382  0.8649671  1.550325 2.000698 10.320533
17 1.156661539  3.8099910 2.842250 1.999826 10.134682
18 -0.496844480  2.0082680 1.456640 2.000119 10.498172
19 -0.001995988  1.7054230 2.702496 1.999963 8.572382
20 -0.190562902  2.6200714 1.822893 1.999612 9.683227

Removing underscore from column names −

Example

names(df2)<-gsub("\_","",names(df2))
df2

Output

    y1 y2 y3 y4 y5
1 0.35283126 2.7403674 1.5855939 1.999599 10.615962
2 2.04048363 1.7570445 1.9365559 1.999934 10.734033
3 -0.99194313 1.9299296 3.4318183 2.000200 8.821012
4 0.03923376 2.8984508 1.3765896 1.999948 8.371278
5 0.48921437 1.7272755 2.0049735 1.999814 10.769563
6 -1.52296501 1.1843431 1.3387394 1.999670 10.984169
7 -0.43659539 3.0847073 2.0724138 2.000099 10.163438
8 -1.07562516 2.4046583 2.3631921 1.999976 8.119308
9 0.25897051 4.0599361 2.5180669 2.000179 8.780155
10 0.90011031 0.5844179 3.0924616 2.000156 10.945022
11 -1.01455924 1.3601391 1.3491111 2.000197 11.172243
12 -1.21902395 1.5613617 1.6721161 2.000014 9.752595
13 1.10335026 3.0485505 2.5479672 2.000200 10.851384
14 1.66150031 0.9157312 2.0733168 2.000298 10.045139
15 -2.88733135 1.6426962 1.4906487 1.999932 10.596103
 16 -0.20689147 1.7962494 0.9636048 1.999893 10.489436
 17 -0.66668766 2.0058826 1.7932363 2.000102 10.702172
18 -0.32072057 2.8834813 2.1764040 2.000017 10.699573
19 -0.29862766 4.6416591 2.8638125 1.999819 10.211451
20 -0.47632229 1.2781510 2.8128627 1.999981 9.046588

Updated on: 16-Oct-2020

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements