Logistic Regression in Python - Introduction


Logistic Regression is a statistical method of classification of objects. This chapter will give an introduction to logistic regression with the help of some examples.


To understand logistic regression, you should know what classification means. Let us consider the following examples to understand this better −

  • A doctor classifies the tumor as malignant or benign.
  • A bank transaction may be fraudulent or genuine.

For many years, humans have been performing such tasks - albeit they are error-prone. The question is can we train machines to do these tasks for us with a better accuracy?

One such example of machine doing the classification is the email Client on your machine that classifies every incoming mail as “spam” or “not spam” and it does it with a fairly large accuracy. The statistical technique of logistic regression has been successfully applied in email client. In this case, we have trained our machine to solve a classification problem.

Logistic Regression is just one part of machine learning used for solving this kind of binary classification problem. There are several other machine learning techniques that are already developed and are in practice for solving other kinds of problems.

If you have noted, in all the above examples, the outcome of the predication has only two values - Yes or No. We call these as classes - so as to say we say that our classifier classifies the objects in two classes. In technical terms, we can say that the outcome or target variable is dichotomous in nature.

There are other classification problems in which the output may be classified into more than two classes. For example, given a basket full of fruits, you are asked to separate fruits of different kinds. Now, the basket may contain Oranges, Apples, Mangoes, and so on. So when you separate out the fruits, you separate them out in more than two classes. This is a multivariate classification problem.