- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to test for significant relationship between two categorical columns of an R data frame?
To test for the significance of proportion between two categorical columns of an R data frame, we first need to find the contingency table using those columns and then apply the chi square test for independence using chisq.test. For example, if we have a data frame called df that contains two categorical columns say C1 and C2 then the test for significant relationship can be done by using the command chisq.test(table(df$C1,df$C2))
Example
x1<-sample(LETTERS[1:4],20,replace=TRUE) y1<-sample(letters[1:4],20,replace=TRUE) df1<-data.frame(x1,y1) df1
Output
x1 y1 1 D a 2 B d 3 D d 4 B d 5 A a 6 A b 7 B c 8 D d 9 C d 10 D c 11 C a 12 D c 13 D a 14 A a 15 B d 16 A c 17 C d 18 A d 19 C b 20 D a
Example
table(df1$x1,df1$y1)
Output
a b c d A 2 1 1 1 B 0 0 1 3 C 1 1 0 2 D 3 0 2 2
Finding significant relationship between columns x1 and y1 of df1 −
Example
chisq.test(table(df1$x1,df1$y1))
Output
Pearson's Chi-squared test data: table(df1$x1, df1$y1) X-squared = 7.4464, df = 9, p-value = 0.5907 Warning message: In chisq.test(table(df1$x1, df1$y1)) : Chi-squared approximation may be incorrect
Example
x2<-sample(c("hot","cold"),20,replace=TRUE) y2<-sample(c("summer","winter","spring"),20,replace=TRUE) df2<-data.frame(x2,y2) df2
Output
x2 y2 1 cold winter 2 hot winter 3 hot winter 4 hot spring 5 cold summer 6 cold summer 7 cold spring 8 hot winter 9 cold summer 10 hot spring 11 hot winter 12 cold winter 13 cold winter 14 hot summer 15 hot winter 16 hot summer 17 hot summer 18 cold summer 19 cold spring 20 hot summer
Example
table(df2$x2,df2$y2)
Output
spring summer winter cold 2 4 3 hot 2 4 5
Finding significant relationship between columns x2 and y2 of df2 −
Example
chisq.test(table(df2$x2,df2$y2))
Output
Pearson's Chi-squared test data: table(df2$x2, df2$y2) X-squared = 0.30303, df = 2, p-value = 0.8594 Warning message: In chisq.test(table(df2$x2, df2$y2)) : Chi-squared approximation may be incorrect
Advertisements