How to change the row index after sampling an R data frame?

R Programming Server Side Programming Programming

When we take a random sample from an R data frame the sample rows have row numbers as in the original data frame, obviously it happens due to randomization. But it might create confusion while doing analysis, especially in cases when we need to use rows, therefore, we can convert the index number of rows to numbers from 1 to the number of rows in the selected sample.

Example

Consider the below data frame −

Live Demo

> set.seed(111)
> x1<-rnorm(20,1.5)
> x2<-rnorm(20,2.5)
> x3<-rnorm(20,3)
> df1<-data.frame(x1,x2,x3)
> df1

Output

      x1          x2       x3
1 1.735220712 2.8616625 1.824274
2 1.169264128 2.8469644 1.878784
3 1.188376176 2.6897365 1.638096
4 -0.802345658 2.3404232 3.481125
5 1.329123955 2.8265492 3.741972
6 1.640278225 3.0982542 3.027825
7 0.002573344 0.6584657 3.331380
8 0.489811581 5.2180556 3.644114
9 0.551524395 2.6912444 5.485662
10 1.006037783 1.1987039 4.959982
11 1.326325872 -0.6132173 3.191663
12 1.093401220 1.5586426 4.552544
13 3.345636264 3.9002588 3.914242
14 1.894054110 0.8795300 3.358625
15 2.297528501 0.2340040 3.175096
16 -0.066665360 3.6629936 2.152732
17 1.414148991 2.3838450 3.978232
18 1.140860519 2.8342560 4.805868
19 0.306391033 1.8791419 3.122915
20 1.864186737 1.1901551 2.870228

Creating a sample of size 5 from df1 −

> df1_sample<-df1[sample(nrow(df1),5),]
> df1_sample

Output

      x1       x2       x3
18 1.140861 2.834256 4.805868
6 1.640278 3.098254 3.027825
13 3.345636 3.900259 3.914242
5 1.329124 2.826549 3.741972
15 2.297529 0.234004 3.175096

Renaming the index number of rows in the sample −

> rownames(df1_sample)<-1:nrow(df1_sample)
> df1_sample

Output

      x1       x2       x3
1 1.140861 2.834256 4.805868
2 1.640278 3.098254 3.027825
3 3.345636 3.900259 3.914242
4 1.329124 2.826549 3.741972
5 2.297529 0.234004 3.175096

Let’s have a look at another example −

Example

Live Demo

> y1<-runif(20,2,5)
> y2<-runif(20,3,5)
> y3<-runif(20,5,10)
> y4<-runif(20,5,12)
> df2<-data.frame(y1,y2,y3,y4)
> df2

Output

      y1       y2       y3       y4
1 2.881213 4.894022 7.797367 6.487594
2 3.052896 3.223898 7.527572 6.695535
3 2.237543 4.127740 9.864026 8.754048
4 4.475907 4.696651 5.403004 6.239423
5 2.792642 4.023536 7.786222 8.992823
6 2.791539 4.333093 9.480036 6.087904
7 2.271143 3.053019 5.539486 8.320935
8 3.382534 3.212921 7.246406 10.091843
9 4.074728 4.390884 6.544056 10.924127
10 4.546881 3.546689 6.164413 11.710035
11 2.738344 4.489939 9.140333 8.211822
12 3.952763 4.490791 5.564392 7.542578
13 4.040586 3.333465 9.420011 11.554599
14 2.313604 4.959709 8.628101 11.193405
15 2.335957 4.189517 9.601667 9.694433
16 2.646964 4.376438 5.614787 10.929413
17 2.390349 3.343716 9.755718 11.017555
18 3.999001 3.083366 8.348515 8.370818
19 3.463324 3.379700 5.425484 7.219430
20 3.059911 4.522844 7.905784 11.420429

> df2_sample<-df2[sample(nrow(df2),7),]
> df2_sample

Output

      y1       y2       y3       y4
20 3.059911 4.522844 7.905784 11.420429
3 2.237543 4.127740 9.864026 8.754048
10 4.546881 3.546689 6.164413 11.710035
12 3.952763 4.490791 5.564392 7.542578
15 2.335957 4.189517 9.601667 9.694433
18 3.999001 3.083366 8.348515 8.370818
5 2.792642 4.023536 7.786222 8.992823

> rownames(df2_sample)<-1:nrow(df2_sample)
> df2_sample

Output

      y1       y2       y3       y4
1 3.059911 4.522844 7.905784 11.420429
2 2.237543 4.127740 9.864026 8.754048
3 4.546881 3.546689 6.164413 11.710035
4 3.952763 4.490791 5.564392 7.542578
5 2.335957 4.189517 9.601667 9.694433
6 3.999001 3.083366 8.348515 8.370818
7 2.792642 4.023536 7.786222 8.992823

Nizamuddin Siddiqui

Updated on: 04-Sep-2020

531 Views

Kickstart Your Career

Get certified by completing the course

Get Started