Biopython - Machine Learning


Bioinformatics is an excellent area to apply machine learning algorithms. Here, we have genetic information of large number of organisms and it is not possible to manually analyze all this information. If proper machine learning algorithm is used, we can extract lot of useful information from these data. Biopython provides useful set of algorithm to do supervised machine learning.

Supervised learning is based on input variable (X) and output variable (Y). It uses an algorithm to learn the mapping function from the input to the output. It is defined below −

Y = f(X)

The main objective of this approach is to approximate the mapping function and when you have new input data (x), you can predict the output variables (Y) for that data.

Logistic Regression Model

Logistic regression is a supervised machine Learning algorithm. It is used to find out the difference between K classes using weighted sum of predictor variables. It computes the probability of an event occurrence and can be used for cancer detection.

Biopython provides Bio.LogisticRegression module to predict variables based on Logistic regression algorithm. Currently, Biopython implements logistic regression algorithm for two classes only (K = 2).

k-Nearest Neighbors

k-Nearest neighbors is also a supervised machine learning algorithm. It works by categorizing the data based on nearest neighbors. Biopython provides Bio.KNN module to predict variables based on k-nearest neighbors algorithm.

Naive Bayes

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other. Biopython provides Bio.NaiveBayes module to work with Naive Bayes algorithm.

Markov Model

A Markov model is a mathematical system defined as a collection of random variables, that experiences transition from one state to another according to certain probabilistic rules. Biopython provides Bio.MarkovModel and Bio.HMM.MarkovModel modules to work with Markov models.