- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to standardize columns if some columns are categorical in R data frame?
To standardize columns if some columns are categorical in R data frame, we can follow the below steps −
First of all, create a data frame.
Then, use numcolwise function from plyr package to standardize columns if some columns are categorical.
Example
Create the data frame
Let’s create a data frame as shown below −
Level<-sample(c("low","medium","high"),25,replace=TRUE) Group<-sample(c("first","second"),25,replace=TRUE) Score<-sample(1:50,25) Demand<-sample(1:100,25) df<-data.frame(Level,Group,Score,Demand) df
Output
On executing, the above script generates the below output(this output will vary on your system due to randomization) −
Level Group Score Demand 1 high second 37 31 2 medium second 43 89 3 medium second 31 50 4 medium first 16 60 5 low first 23 29 6 low second 3 26 7 medium first 26 55 8 low first 9 9 9 high second 14 3 10 medium first 33 36 11 low second 15 34 12 medium first 19 85 13 medium second 41 71 14 medium second 50 78 15 low second 36 69 16 medium second 17 49 17 high second 38 76 18 high first 6 99 19 low first 35 91 20 medium first 39 12 21 low first 47 62 22 medium second 45 59 23 medium first 44 64 24 medium second 21 25 25 high second 11 57
Find the standardize if some columns are categorical
Using numcolwise function from plyr package to standardize numerical columns in the data frame df −
Level<-sample(c("low","medium","high"),25,replace=TRUE) Group<-sample(c("first","second"),25,replace=TRUE) Score<sample(1:50,25) Demand<-sample(1:100,25) df<-data.frame(Level,Group,Score,Demand) library(plyr) numcolwise(scale)(df)
Output
Score Demand 1 -0.02029767 0.99612442 2 -0.81770624 0.46627101 3 0.41465246 0.61765770 4 -0.38275611 -1.76668267 5 1.42953609 1.37459115 6 1.06707765 1.03397109 7 0.63212752 -1.31252260 8 -0.60023118 -0.36635579 9 0.99458596 0.73119771 10 -0.16528105 0.69335104 11 -1.25265637 -0.70697584 12 0.84960258 -0.85836253 13 -0.52773949 -0.10142908 14 0.19717739 -1.69098933 15 -1.10767299 0.84473773 16 -1.47013143 -1.27467593 17 0.55963583 1.14751111 18 -1.83258988 -1.50175597 19 1.35704440 -0.17712243 20 -0.74521455 0.54196435 21 -0.67272287 -0.06358241 22 1.28455271 -0.25281577 23 0.48714414 0.76904439 24 1.50202778 -0.59343583 25 -1.18016468 1.45028449
Advertisements