- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the median for factor levels in R?
The second most used measure of central tendency median is calculated when we have ordinal data or the continuous data has outliers, also if there are factors data then we might need to find the median for levels to compare them with each other. The easiest way to do this is finding summary with aggregate function.
Example
Consider the below data frame that contains one factor column −
set.seed(191) x1<-as.factor(sample(LETTERS[1:3],20,replace=TRUE)) x2<-sample(1:10,20,replace=TRUE) df1<-data.frame(x1,x2) df1
Output
x1 x2 1 B 6 2 C 5 3 B 4 4 C 8 5 B 5 6 B 5 7 A 4 8 C 8 9 C 3 10 C 4 11 B 9 12 A 10 13 C 6 14 C 1 15 A 10 16 A 3 17 A 5 18 C 7 19 B 3 20 C 1
Example
> str(df1)
Output
'data.frame': 20 obs. of 2 variables: $ x1: Factor w/ 3 levels "A","B","C": 2 3 2 3 2 2 1 3 3 3 ... $ x2: int 6 5 4 8 5 5 4 8 3 4 ...
Finding the median of x2 for the categories in x1 −
Example
aggregate(x2~x1,data=df1,summary)
Output
x1 x2.Min. x2.1st Qu.x2.Median x2.Mean x2.3rd Qu. x2.Max. 1 A 3.000000 4.000000 5.000000 6.400000 10.000000 10.000000 2 B 3.000000 4.250000 5.000000 5.333333 5.750000 9.000000 3 C 1.000000 3.000000 5.000000 4.777778 7.000000 8.000000
Let’s have a look at another example −
Example
Temperature<-as.factor(sample(c("Cold","Hot"),20,replace=TRUE)) Sales<-sample(50000:80000,20) df2<-data.frame(Temperature,Sales) df2
Output
Temperature Sales 1 Cold 72210 2 Cold 56758 3 Hot 53809 4 Hot 79977 5 Hot 77135 6 Cold 56932 7 Hot 51104 8 Cold 67742 9 Hot 75402 10 Hot 62546 11 Cold 68520 12 Hot 54575 13 Cold 51591 14 Hot 55232 15 Hot 77742 16 Hot 62507 17 Hot 62156 18 Cold 73853 19 Cold 69807 20 Hot 53930
Finding the median of Sales for the categories in Temperature −
Example
aggregate(Sales~Temperature,data=df2,summary)
Output
Temperature Sales.Min. Sales.1st Qu. Sales.Median Sales.Mean Sales.3rd Qu. 1 Cold 51591.00 56888.50 68131.00 64676.62 70407.75 2 Hot 51104.00 54413.75 62331.50 63842.92 75835.25 Sales.Max. 1 73853.00 2 79977.00
Advertisements