- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What are levels in a column of a data frame in R?
Most people get confused about levels and characters in R, especially the newbies. The difference is that levels specifically define the factor levels of a factor column and the characters are simple the character column that is not a factor or is not used as a factor but can be converted to a factor.
Example
Consider the below data frame −
> x1<-factor(sample(LETTERS[1:4],20,replace=TRUE)) > x2<-sample(LETTERS[1:4],20,replace=TRUE) > df1<-data.frame(x1,x2) > df1
Output
x1 x2 1 B B 2 B A 3 D D 4 D C 5 C A 6 D C 7 A D 8 D B 9 D C 10 B B 11 C B 12 D A 13 C D 14 B B 15 C B 16 C A 17 B A 18 D C 19 C B 20 D D
Looking at the structure of df1 to understand the difference between factor and character column −
> str(df1) 'data.frame': 20 obs. of 2 variables: $ x1: Factor w/ 4 levels "A","B","C","D": 2 2 4 4 3 4 1 4 4 2 ... $ x2: chr "B" "A" "D" "C" ...
Example
> y1<-factor(sample(c("Winter","Spring","Summer"),20,replace=TRUE)) > y2<-rnorm(20) > df2<-data.frame(y1,y2) > df2
Output
y1 y2 1 Summer -0.9006581 2 Winter 0.8897190 3 Summer 0.2585291 4 Spring 1.5118381 5 Winter -1.0277900 6 Winter 0.1853884 7 Spring 0.1425927 8 Spring -0.1824645 9 Summer 1.6294306 10 Summer 1.3320479 11 Spring -0.1468691 12 Spring 0.7244621 13 Spring -0.4379905 14 Spring 1.0983712 15 Summer -1.0212200 16 Winter 0.5164757 17 Summer 2.2103486 18 Summer 0.6049139 19 Winter -0.1642906 20 Spring 1.5057525
Looking at the structure of df1 to understand the difference between factor and numerical column −
> str(df2) 'data.frame': 20 obs. of 2 variables: $ y1: Factor w/ 3 levels "Spring","Summer",..: 2 3 2 1 3 3 1 1 2 2 ... $ y2: num -0.901 0.89 0.259 1.512 -1.028 ...
Advertisements