What are levels in a column of a data frame in R?


Most people get confused about levels and characters in R, especially the newbies. The difference is that levels specifically define the factor levels of a factor column and the characters are simple the character column that is not a factor or is not used as a factor but can be converted to a factor.

Example

Consider the below data frame −

Live Demo

> x1<-factor(sample(LETTERS[1:4],20,replace=TRUE))
> x2<-sample(LETTERS[1:4],20,replace=TRUE)
> df1<-data.frame(x1,x2)
> df1

Output

x1 x2
1 B B
2 B A
3 D D
4 D C
5 C A
6 D C
7 A D
8 D B
9 D C
10 B B
11 C B
12 D A
13 C D
14 B B
15 C B
16 C A
17 B A
18 D C
19 C B
20 D D

Looking at the structure of df1 to understand the difference between factor and character column −

> str(df1)
'data.frame': 20 obs. of 2 variables:
$ x1: Factor w/ 4 levels "A","B","C","D": 2 2 4 4 3 4 1 4 4 2 ...
$ x2: chr "B" "A" "D" "C" ...

Example

Live Demo

> y1<-factor(sample(c("Winter","Spring","Summer"),20,replace=TRUE))
> y2<-rnorm(20)
> df2<-data.frame(y1,y2)
> df2

Output

y1 y2
1 Summer -0.9006581
2 Winter 0.8897190
3 Summer 0.2585291
4 Spring 1.5118381
5 Winter -1.0277900
6 Winter 0.1853884
7 Spring 0.1425927
8 Spring -0.1824645
9 Summer 1.6294306
10 Summer 1.3320479
11 Spring -0.1468691
12 Spring 0.7244621
13 Spring -0.4379905
14 Spring 1.0983712
15 Summer -1.0212200
16 Winter 0.5164757
17 Summer 2.2103486
18 Summer 0.6049139
19 Winter -0.1642906
20 Spring 1.5057525

Looking at the structure of df1 to understand the difference between factor and numerical column −

> str(df2)
'data.frame': 20 obs. of 2 variables:
$ y1: Factor w/ 3 levels "Spring","Summer",..: 2 3 2 1 3 3 1 1 2 2 ...
$ y2: num -0.901 0.89 0.259 1.512 -1.028 ...

Updated on: 04-Jan-2021

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements