- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the difference of values of each row from previous by group in an R data frame?
In Data Analysis, sometimes we need to find the difference of the current value from the previous value and it can be also needed for groups. It helps us to compare the differences among the values. In R, we can use dplyr package’s group_by and mutate function with lag.
Example
Consider the below data frame −
> Group<-rep(LETTERS[1:5],each=4) > Frequency<-sample(1:20,20,replace=TRUE) > df1<-data.frame(Group,Frequency) > df1
Output
Group Frequency 1 A 7 2 A 6 3 A 9 4 A 12 5 B 19 6 B 19 7 B 4 8 B 6 9 C 14 10 C 6 11 C 6 12 C 20 13 D 2 14 D 11 15 D 14 16 D 19 17 E 14 18 E 7 19 E 3 20 E 1
Loading dplyr package −
> library(dplyr)
Subtracting Frequencies of each row from previous by Group −
> df1%>%group_by(Group)%>%mutate(Difference=Frequencylag(Frequency,default=first(Frequency))) # A tibble: 20 x 3 # Groups: Group [5]
Output
Group Frequency Difference <fct> <int> <int> 1 A 7 0 2 A 6 -1 3 A 9 3 4 A 12 3 5 B 19 0 6 B 19 0 7 B 4 -15 8 B 6 2 9 C 14 0 10 C 6 -8 11 C 6 0 12 C 20 14 13 D 2 0 14 D 11 9 15 D 14 3 16 D 19 5 17 E 14 0 18 E 7 -7 19 E 3 -4 20 E 1 -2
Let’s have a look at another example −
Example
> x<-rep(c("S1","S2","S3","S4","S5"),times=4) > y<-rnorm(20) > df2<-data.frame(x,y) > df2
Output
x y 1 S1 -0.2648554 2 S2 -1.6024447 3 S3 -0.3668267 4 S4 0.6439787 5 S5 1.9406125 6 S1 1.8398485 7 S2 1.5151748 8 S3 -0.7975164 9 S4 -1.4744469 10 S5 -0.4300237 11 S1 -1.2181901 12 S2 -0.9504064 13 S3 1.0594684 14 S4 -0.3190330 15 S5 -0.4186285 16 S1 0.2418591 17 S2 0.4273363 18 S3 1.2725779 19 S4 0.1008520 20 S5 0.0362863
> df2%>%group_by(x)%>%mutate(Difference=y-lag(y,default=first(y))) # A tibble: 20 x 3 # Groups: x [5]
Output
x y Difference <fct> <dbl> <dbl> 1 S1 -0.265 0 2 S2 -1.60 0 3 S3 -0.367 0 4 S4 0.644 0 5 S5 1.94 0 6 S1 1.84 2.10 7 S2 1.52 3.12 8 S3 -0.798 -0.431 9 S4 -1.47 -2.12 10 S5 -0.430 -2.37 11 S1 -1.22 -3.06 12 S2 -0.950 -2.47 13 S3 1.06 1.86 14 S4 -0.319 1.16 15 S5 -0.419 0.0114 16 S1 0.242 1.46 17 S2 0.427 1.38 18 S3 1.27 0.213 19 S4 0.101 0.420 20 S5 0.0363 0.455
Advertisements