Find the standard deviation for every n number of observations in an R data frame column.


To find the standard deviation for every n number of observations in an R data frame, we can use rollapply function of zoo package.

For Example, if we have a data frame called df that contains a column say X containing 100 values then we can create a column with standard deviation of every 10 values by using the below command −

df$SD_10<-rep(rollapply(df[,1],width=10,by=10,sd),each=10)

Example 1

Following snippet creates a sample data frame −

x<-rpois(20,5)
df1<-data.frame(x)
df1

The following dataframe is created −

    x
1   3
2   5
3   5
4   1
5   5
6   4
7   5
8   5
9   4
10  3
11  4
12  1
13  7
14  2
15  6
16 10
17  5
18  9
19  4
20  4

To load the zoo package and to find the standard deviation of every 5 values in x on the above created data frame, add the following code to the above snippet −

x<-rpois(20,5)
df1<-data.frame(x)
library(zoo)
df1$SD_5<-rep(rollapply(df1[,1],width=5,by=5,sd),each=5)
df1

Output

If you execute all the above given snippets as a single program, it generates the following Output −

   x     SD_5
 1 3 1.788854
 2 5 1.788854
 3 5 1.788854
 4 1 1.788854
 5 5 1.788854
 6 4 0.836660
 7 5 0.836660
 8 5 0.836660
 9 4 0.836660
10 3 0.836660
11 4 2.549510
12 1 2.549510
13 7 2.549510
14 2 2.549510
15 6 2.549510
16 10 2.880972
17 5 2.880972
18 9 2.880972
19 4 2.880972
20 4 2.880972

Example 2

Following snippet creates a sample data frame −

y<-rnorm(20)
df2<-data.frame(y)
df2

The following dataframe is created −

            y
 1 -0.59258077
 2  0.44336315
 3  1.03389921
 4 -0.50471102
 5 -0.10370441
 6  1.49547406
 7  0.18575630
 8 -0.73030467
 9 -1.15666426
10  1.68174045
11 -0.03226993
12 -0.49435218
13 -1.98371898
14  2.04194072
15  2.44473953
16  0.26519508
17 -0.36658534
18 -0.15745538
19  0.15730767
20  0.91778671

To find the standard deviation of every 4 values in y on the above created data frame, add the following code to the above snippet −

y<-rnorm(20)
df2<-data.frame(y)
df2$SD_4<-rep(rollapply(df2[,1],width=4,by=4,sd),each=4)
df2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

       y         SD_4
1  -0.59258077 0.7821571
2   0.44336315 0.7821571
3   1.03389921 0.7821571
4  -0.50471102 0.7821571
5  -0.10370441 0.9373014
6   1.49547406 0.9373014
7   0.18575630 0.9373014
8  -0.73030467 0.9373014
9  -1.15666426 1.2126483
10  1.68174045 1.2126483
11 -0.03226993 1.2126483
12 -0.49435218 1.2126483
13 -1.98371898 2.0195767
14  2.04194072 2.0195767
15  2.44473953 2.0195767
16  0.26519508 2.0195767
17 -0.36658534 0.5628322
18 -0.15745538 0.5628322
19  0.15730767 0.5628322
20  0.91778671 0.5628322

Updated on: 11-Nov-2021

191 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements