- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to replace $ sign combined with some specific values in an R data frame?
Sometimes we get very dirty data and that is the reason data analysis is a difficult task. Most of the data scientists look for clean data but it is almost impossible due to data warehouses often just focus on the data availability instead of the quality of data. One of the head scratching situations is getting an unnecessary value placed at different position in a random manner, $ sign is also a that type of value. We can remove this from an R data frame by using lapply function.
Example
Consider the below data frame:
> x<-sample(c("A","$B","C"),20,replace=TRUE) > y<-sample(c("I","II","$II"),20,replace=TRUE) > df1<-data.frame(x,y) > df1
Output
x y 1 C $II 2 C II 3 A I 4 $B $II 5 $B $II 6 A I 7 A $II 8 C I 9 $B II 10 $B II 11 C $II 12 A II 13 $B II 14 C I 15 C $II 16 C I 17 C II 18 $B I 19 $B II 20 C $II
Removing $ sign from every place in df1:
Example
> df1<-lapply(df1,gsub,pattern='\$',replacement='') > df1 $x
Output
[1] "C" "C" "A" "B" "B" "A" "A" "C" "B" "B" "C" "A" "B" "C" "C" "C" "C" "B" "B" [20] "C"
Example
$y
Output
[1] "II" "II" "I" "II" "II" "I" "II" "I" "II" "II" "II" "II" "II" "I" "II" [16] "I" "II" "I" "II" "II"
Let’s have a look at another example:
Example
> Price<-sample(c("1$","2$","3$","4$"),20,replace=TRUE) > Group<-sample(c("$First","$Second","Third"),20,replace=TRUE) > df2<-data.frame(Price,Group) > df2
Output
Price Group 1 3$ $Second 2 2$ Third 3 1$ Third 4 2$ $Second 5 2$ $First 6 4$ $First 7 2$ $First 8 3$ $First 9 2$ Third 10 4$ Third 11 3$ $First 12 3$ Third 13 3$ $Second 14 2$ $First 15 4$ Third 16 3$ $First 17 4$ Third 18 2$ $First 19 2$ $Second 20 3$ Third
Removing $ sign from every place in df2:
Example
> df2<-lapply(df2,gsub,pattern='\$',replacement='') > df2
Output
$Price [1] "3" "2" "1" "2" "2" "4" "2" "3" "2" "4" "3" "3" "3" "2" "4" "3" "4" "2" "2" [20] "3" $Group [1] "Second" "Third" "Third" "Second" "First" "First" "First" "First" [9] "Third" "Third" "First" "Third" "Second" "First" "Third" "First" [17] "Third" "First" "Second" "Third"
Advertisements