- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to create a subset based on levels of a character column in R?
In R programming, mostly the columns with string values can be either represented by character data type or factor data type. For example, if we have a column Group with four unique values as A, B, C, and D then it can be of character or factor with four levels. If we want to take the subset of these columns then subset function can be used. Check out the example below.
Consider the below data frame −
Example
set.seed(888) Grp<-sample(c("A","B","C"),20,replace=TRUE)Age<-sample(21:50,20) df1<-data.frame(Grp,Age) df1
Output
Grp Age 1 A 35 2 C 40 3 C 48 4 C 46 5 C 36 6 C 33 7 B 47 8 A 45 9 B 43 10 B 37 11 B 30 12 A 24 13 C 39 14 C 50 15 C 25 16 A 34 17 B 49 18 A 44 19 C 38 20 B 26
str(df1) 'data.frame': 20 obs. of 2 variables:
$ Grp: chr "A" "C" "C" "C" ... $ Age: int 35 40 48 46 36 33 47 45 43 37 ...
Taking subset of df1 based on Grp column values A and C −
Example
subset(df1, Grp %in% c("A","C"))
Output
Grp Age 1 A 35 2 C 40 3 C 48 4 C 46 5 C 36 6 C 33 8 A 45 12 A 24 13 C 39 14 C 50 15 C 25 16 A 34 18 A 44 19 C 38
Let’s have a look at another example −
Example
Class<-sample(c("First","Second","Third","Fourth"),20,replace=TRUE) Score<-sample(1:10,20,replace=TRUE) df2<-data.frame(Class,Score) df2
Output
Class Score 1 First 10 2 First 3 3 First 1 4 First 7 5 First 1 6 Third 4 7 First 3 8 First 3 9 Second 2 10 First 8 11 Fourth 1 12 Third 6 13 First 6 14 Second 1 15 First 8 16 Fourth 4 17 Third 7 18 Fourth 4 19 Third 7 20 Fourth 1
str(df2) 'data.frame': 20 obs. of 2 variables:
$ Class: chr "First" "Third" "Second" "First" ... $ Score: int 1 4 9 8 9 10 2 8 5 8 ...
Taking subset of df2 based on Class column values First and Fourth −
Example
subset(df2, Class %in% c("First","Fourth"))
Output
Class Score 1 First 1 4 First 8 5 First 9 6 Fourth 10 7 Fourth 2 9 Fourth 5 10 Fourth 8 11 Fourth 8 13 Fourth 7 14 Fourth 10 15 First 7 16 Fourth 10 17 Fourth 4 19 First 2 20 First 10
Advertisements