Sometimes we get very dirty data and that is the reason data analysis is a difficult task. Most of the data scientists look for clean data but it is almost impossible due to data warehouses often just focus on the data availability instead of the quality of data. One of the head scratching situations is getting an unnecessary value placed at different position in a random manner, $ sign is also a that type of value. We can remove this from an R data frame by using lapply function.
Consider the below data frame:
> x<-sample(c("A","$B","C"),20,replace=TRUE) > y<-sample(c("I","II","$II"),20,replace=TRUE) > df1<-data.frame(x,y) > df1
x y 1 C $II 2 C II 3 A I 4 $B $II 5 $B $II 6 A I 7 A $II 8 C I 9 $B II 10 $B II 11 C $II 12 A II 13 $B II 14 C I 15 C $II 16 C I 17 C II 18 $B I 19 $B II 20 C $II
Removing $ sign from every place in df1:
> df1<-lapply(df1,gsub,pattern='\\$',replacement='') > df1 $x
[1] "C" "C" "A" "B" "B" "A" "A" "C" "B" "B" "C" "A" "B" "C" "C" "C" "C" "B" "B" [20] "C"
$y
[1] "II" "II" "I" "II" "II" "I" "II" "I" "II" "II" "II" "II" "II" "I" "II" [16] "I" "II" "I" "II" "II"
Let’s have a look at another example:
> Price<-sample(c("1$","2$","3$","4$"),20,replace=TRUE) > Group<-sample(c("$First","$Second","Third"),20,replace=TRUE) > df2<-data.frame(Price,Group) > df2
Price Group 1 3$ $Second 2 2$ Third 3 1$ Third 4 2$ $Second 5 2$ $First 6 4$ $First 7 2$ $First 8 3$ $First 9 2$ Third 10 4$ Third 11 3$ $First 12 3$ Third 13 3$ $Second 14 2$ $First 15 4$ Third 16 3$ $First 17 4$ Third 18 2$ $First 19 2$ $Second 20 3$ Third
Removing $ sign from every place in df2:
> df2<-lapply(df2,gsub,pattern='\\$',replacement='') > df2
$Price [1] "3" "2" "1" "2" "2" "4" "2" "3" "2" "4" "3" "3" "3" "2" "4" "3" "4" "2" "2" [20] "3" $Group [1] "Second" "Third" "Third" "Second" "First" "First" "First" "First" [9] "Third" "Third" "First" "Third" "Second" "First" "Third" "First" [17] "Third" "First" "Second" "Third"