

- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to test for significant relationship between two categorical columns of an R data frame?
To test for the significance of proportion between two categorical columns of an R data frame, we first need to find the contingency table using those columns and then apply the chi square test for independence using chisq.test. For example, if we have a data frame called df that contains two categorical columns say C1 and C2 then the test for significant relationship can be done by using the command chisq.test(table(df$C1,df$C2))
Example
x1<-sample(LETTERS[1:4],20,replace=TRUE) y1<-sample(letters[1:4],20,replace=TRUE) df1<-data.frame(x1,y1) df1
Output
x1 y1 1 D a 2 B d 3 D d 4 B d 5 A a 6 A b 7 B c 8 D d 9 C d 10 D c 11 C a 12 D c 13 D a 14 A a 15 B d 16 A c 17 C d 18 A d 19 C b 20 D a
Example
table(df1$x1,df1$y1)
Output
a b c d A 2 1 1 1 B 0 0 1 3 C 1 1 0 2 D 3 0 2 2
Finding significant relationship between columns x1 and y1 of df1 −
Example
chisq.test(table(df1$x1,df1$y1))
Output
Pearson's Chi-squared test data: table(df1$x1, df1$y1) X-squared = 7.4464, df = 9, p-value = 0.5907 Warning message: In chisq.test(table(df1$x1, df1$y1)) : Chi-squared approximation may be incorrect
Example
x2<-sample(c("hot","cold"),20,replace=TRUE) y2<-sample(c("summer","winter","spring"),20,replace=TRUE) df2<-data.frame(x2,y2) df2
Output
x2 y2 1 cold winter 2 hot winter 3 hot winter 4 hot spring 5 cold summer 6 cold summer 7 cold spring 8 hot winter 9 cold summer 10 hot spring 11 hot winter 12 cold winter 13 cold winter 14 hot summer 15 hot winter 16 hot summer 17 hot summer 18 cold summer 19 cold spring 20 hot summer
Example
table(df2$x2,df2$y2)
Output
spring summer winter cold 2 4 3 hot 2 4 5
Finding significant relationship between columns x2 and y2 of df2 −
Example
chisq.test(table(df2$x2,df2$y2))
Output
Pearson's Chi-squared test data: table(df2$x2, df2$y2) X-squared = 0.30303, df = 2, p-value = 0.8594 Warning message: In chisq.test(table(df2$x2, df2$y2)) : Chi-squared approximation may be incorrect
- Related Questions & Answers
- Create cross tabulation for three categorical columns in an R data frame.
- How to perform Wilcoxon test for all columns in an R data frame?
- How to perform shapiro test for all columns in an R data frame?
- How to compare two columns in an R data frame for an exact match?
- How to find the mean of a numerical column by two categorical columns in an R data frame?
- How to apply two sample t test using a categorical column in R data frame?
- How to find the counts of categories in categorical columns in an R data frame?
- How to standardize columns if some columns are categorical in R data frame?
- How to find the significant correlation in an R data frame?
- How to standardize only numerical columns in an R data frame if categorical columns also exist?
- How to find the counts of categories in categorical columns in an R Programming data frame?
- How to create table of two factor columns in an R data frame?
- How to apply one sample t-test on all columns of an R data frame?
- How to find the range of columns if some columns are categorical in R data frame?
- How to convert two columns of an R data frame to a named vector?
Advertisements