Contingency Table in Python

Python Server Side Programming Programming

A contingency table is a table showing the distribution of one variable in rows and another variable in columns. It is used to study the correlation between the two variables. It is a multiway table which describes a dataset in which each observation belongs to one category for each of several variables. Also It is basically a tally of counts between two or more categorical variables. Contingency tables are also called crosstabs or two-way tables,used in statistics to summarize the relationship between several categorical variables.

The contingency coefficient is a coefficient of association which tells whether two variables or datasets are independent or dependent of each other,It is also known as Pearson's Coefficient

Example

In the below example we take the iris flower data set for analysis. This data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. We will create contingency model on these features which will be ultimately used in distinguishing the species from each other.

Reading the Dataset

Example

import numpy as np
import pandas as pd
datainput = pd.read_csv("iris.csv")
print (datainput.head(5))

Running the above code gives us the following result:

SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

General Statistics of the Data

Next, we gather the general statistics of the data by using the describe(). IT gives an idea about the mean and different quartiles of how the data is distributed.

Example

import numpy as np
import pandas as pd
datainput = pd.read_csv("iris.csv")
print(datainput.describe())

Running the above code gives us the following result:

SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

Data Types

Next we observe different data types of the columns in the dataframe.

Example

import numpy as np
import pandas as pd
datainput = pd.read_csv("iris.csv")
print(datainput.dtypes)

Running the above code gives us the following result:

SepalLengthCm float64
SepalWidthCm float64
PetalLengthCm float64
PetalWidthCm float64
Species object
dtype: object

Creating Contingency Table

Now we create a contingency table for the column showing petal width for each species. For this we use the crosstab function available in pandas and give these tow column’s names as inputs.

Example

import numpy as np
import pandas as pd
datainput = pd.read_csv("iris.csv")
width_species = pd.crosstab(datainput['PetalWidthCm'],datainput['Species'],margins = False)
print(width_species)

Running the above code gives us the following result:

Species Iris-setosa Iris-versicolor Iris-virginica
PetalWidthCm
0.1 6 0 0
0.2 28 0 0
0.3 7 0 0
1.0 0 7 0
1.1 0 3 0
1.2 0 5 0
1.8 0 1 11
1.9 0 0 5
2.0 0 0 6
2.1 0 0 6
2.5 0 0 3

Multi-variate Contingency Table

In this case we use more than two columns to create the contingency table. Here we use both petal length and petal width for each type of species.

import numpy as np
import pandas as pd
datainput = pd.read_csv("iris.csv")
length_width_species = pd.crosstab([datainput.PetalLengthCm, datainput.PetalWidthCm],datainput.Species, margins = False)
print(length_width_species)

Running the above code gives us the following result:

Species Iris-setosa Iris-versicolor Iris-virginica
PetalLengthCm PetalWidthCm
1.0 0.2 1 0 0
1.1 0.1 1 0 0
1.2 0.2 2 0 0
1.3 0.2 4 0 0
0.3 2 0 0
... ... ... ...
6.4 2.0 0 0 1
6.6 2.1 0 0 1
6.7 2.0 0 0 1
2.2 0 0 1
6.9 2.3 0 0 1

Pavitra

e

Updated on: 30-Dec-2019

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started

Advertisements