How to convert a data frame to a matrix if the data frame contains factor variable as strings in R?


A matrix contains only numeric values, therefore, if we will convert a data frame that has factor variables as strings then the factor levels will be converted to numbers. These numbering is based on the first character of the factor level, for example, if the string starts with an A then it will get 1, and so on. To convert a data frame to a matrix if the data frame contains factor variable as strings, we need to read the data frame as matrix.

Example

Consider the below data frame −

x1<-1:10
x2<-10:1
x3<-letters[1:10]
x4<-LETTERS[1:10]
x5<-letters[10:1]
x6<-LETTERS[10:1]
x7<-rnorm(10)
x8<-rnorm(10,0.2)
x9<-rnorm(10,0.5)
x10<-rnorm(10,1)
df<-data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)
str(df)

Output

'data.frame': 10 obs. of 10 variables:
$ x1 : int 1 2 3 4 5 6 7 8 9 10
$ x2 : int 10 9 8 7 6 5 4 3 2 1
$ x3 : Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
$ x4 : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
$ x5 : Factor w/ 10 levels "a","b","c","d",..: 10 9 8 7 6 5 4 3 2 1
$ x6 : Factor w/ 10 levels "A","B","C","D",..: 10 9 8 7 6 5 4 3 2 1
$ x7 : num 0.526 -0.795 1.428 -1.467 -0.237 ...
$ x8 : num 0.0362 0.9085 -0.068 -1.2639 0.9444 ...
$ x9 : num 1.395 0.779 1.508 -1.573 1.69 ...
$ x10: num 1.482 1.758 -1.319 0.54 -0.105 ...
 df
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 1 10 a A j J 0.5264481 0.03624433 1.3949372 1.4824588
2 2 9 b B i I -0.7948444 0.90852210 0.7791520 1.7582138
3 3 8 c C h H 1.4277555 -0.06798055 1.5078658 -1.3193274
4 4 7 d D g G -1.4668197 -1.26392176 -1.5731065 0.5404952
5 5 6 e E f F -0.2366834 0.94443582 1.6898534 -0.1053837
6 6 5 f F e E -0.1933380 -1.21039018 -0.2243742 1.4029283
7 7 4 g G d D -0.8497547 0.66706761 0.6679838 1.5689349
8 8 3 h H c C 0.0584655 0.08067989 1.4203352 0.2939167
9 9 2 i I b B -0.8176704 0.66723896 -1.1716048 0.7099094
10 10 1 j J a A -2.0503078 0.69813556 0.9484691 -0.4838781

Converting the data frame df to matrix −

Example

matrix(as.numeric(unlist(df)),nrow=nrow(df))

Output

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 10 1 1 10 10 0.5264481 0.03624433 1.3949372
[2,] 2 9 2 2 9 9 -0.7948444 0.90852210 0.7791520
[3,] 3 8 3 3 8 8 1.4277555 -0.06798055 1.5078658
[4,] 4 7 4 4 7 7 -1.4668197 -1.26392176 -1.5731065
[5,] 5 6 5 5 6 6 -0.2366834 0.94443582 1.6898534
[6,] 6 5 6 6 5 5 -0.1933380 -1.21039018 -0.2243742
[7,] 7 4 7 7 4 4 -0.8497547 0.66706761 0.6679838
[8,] 8 3 8 8 3 3 0.0584655 0.08067989 1.4203352
[9,] 9 2 9 9 2 2 -0.8176704 0.66723896 -1.1716048
[10,] 10 1 10 10 1 1 -2.0503078 0.69813556 0.9484691
[,10]
[1,] 1.4824588
[2,] 1.7582138
[3,] -1.3193274
[4,] 0.5404952
[5,] -0.1053837
[6,] 1.4029283
[7,] 1.5689349
[8,] 0.2939167
[9,] 0.7099094
[10,] -0.4838781

Let’s have a look at another example −

Example

y1<-c("Age","Sex","Salary","Education","Ethnicity")
y2<-1:5
y3<-c(24,15,48,72,29)
df_y<-data.frame(y1,y2,y3)
df_y

Output

y1 y2 y3
1 Age 1 24
2 Sex 2 15
3 Salary 3 48
4 Education 4 72
5 Ethnicity 5 29

Example

matrix(as.numeric(unlist(df_y)),nrow=5)

Output

[,1] [,2] [,3]
[1,] 1 1 24
[2,] 5 2 15
[3,] 4 3 48
[4,] 2 4 72
[5,] 3 5 29

Updated on: 21-Aug-2020

415 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements