
- Scikit Learn Tutorial
- Scikit Learn - Home
- Scikit Learn - Introduction
- Scikit Learn - Modelling Process
- Scikit Learn - Data Representation
- Scikit Learn - Estimator API
- Scikit Learn - Conventions
- Scikit Learn - Linear Modeling
- Scikit Learn - Extended Linear Modeling
- Stochastic Gradient Descent
- Scikit Learn - Support Vector Machines
- Scikit Learn - Anomaly Detection
- Scikit Learn - K-Nearest Neighbors
- Scikit Learn - KNN Learning
- Classification with Naïve Bayes
- Scikit Learn - Decision Trees
- Randomized Decision Trees
- Scikit Learn - Boosting Methods
- Scikit Learn - Clustering Methods
- Clustering Performance Evaluation
- Dimensionality Reduction using PCA
- Scikit Learn Useful Resources
- Scikit Learn - Quick Guide
- Scikit Learn - Useful Resources
- Scikit Learn - Discussion
Scikit Learn - Logistic Regression
Logistic regression, despite its name, is a classification algorithm rather than regression algorithm. Based on a given set of independent variables, it is used to estimate discrete value (0 or 1, yes/no, true/false). It is also called logit or MaxEnt Classifier.
Basically, it measures the relationship between the categorical dependent variable and one or more independent variables by estimating the probability of occurrence of an event using its logistics function.
sklearn.linear_model.LogisticRegression is the module used to implement logistic regression.
Parameters
Following table lists the parameters used by Logistic Regression module −
Sr.No | Parameter & Description |
---|---|
1 |
penalty − str, ‘L1’, ‘L2’, ‘elasticnet’ or none, optional, default = ‘L2’ This parameter is used to specify the norm (L1 or L2) used in penalization (regularization). |
2 |
dual − Boolean, optional, default = False It is used for dual or primal formulation whereas dual formulation is only implemented for L2 penalty. |
3 |
tol − float, optional, default=1e-4 It represents the tolerance for stopping criteria. |
4 |
C − float, optional, default=1.0 It represents the inverse of regularization strength, which must always be a positive float. |
5 |
fit_intercept − Boolean, optional, default = True This parameter specifies that a constant (bias or intercept) should be added to the decision function. |
6 |
intercept_scaling − float, optional, default = 1 This parameter is useful when
|
7 |
class_weight − dict or ‘balanced’ optional, default = none It represents the weights associated with classes. If we use the default option, it means all the classes are supposed to have weight one. On the other hand, if you choose class_weight: balanced, it will use the values of y to automatically adjust weights. |
8 |
random_state − int, RandomState instance or None, optional, default = none This parameter represents the seed of the pseudo random number generated which is used while shuffling the data. Followings are the options
|
9 |
solver − str, {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘saag’, ‘saga’}, optional, default = ‘liblinear’ This parameter represents which algorithm to use in the optimization problem. Followings are the properties of options under this parameter −
|
10 |
max_iter − int, optional, default = 100 As name suggest, it represents the maximum number of iterations taken for solvers to converge. |
11 |
multi_class − str, {‘ovr’, ‘multinomial’, ‘auto’}, optional, default = ‘ovr’
|
12 |
verbose − int, optional, default = 0 By default, the value of this parameter is 0 but for liblinear and lbfgs solver we should set verbose to any positive number. |
13 |
warm_start − bool, optional, default = false With this parameter set to True, we can reuse the solution of the previous call to fit as initialization. If we choose default i.e. false, it will erase the previous solution. |
14 |
n_jobs − int or None, optional, default = None If multi_class = ‘ovr’, this parameter represents the number of CPU cores used when parallelizing over classes. It is ignored when solver = ‘liblinear’. |
15 |
l1_ratio − float or None, optional, dgtefault = None It is used in case when penalty = ‘elasticnet’. It is basically the Elastic-Net mixing parameter with 0 < = l1_ratio > = 1. |
Attributes
Followings table consist the attributes used by Logistic Regression module −
Sr.No | Attributes & Description |
---|---|
1 |
coef_ − array, shape(n_features,) or (n_classes, n_features) It is used to estimate the coefficients of the features in the decision function. When the given problem is binary, it is of the shape (1, n_features). |
2 |
Intercept_ − array, shape(1) or (n_classes) It represents the constant, also known as bias, added to the decision function. |
3 |
classes_ − array, shape(n_classes) It will provide a list of class labels known to the classifier. |
4 |
n_iter_ − array, shape (n_classes) or (1) It returns the actual number of iterations for all the classes. |
Implementation Example
Following Python script provides a simple example of implementing logistic regression on iris dataset of scikit-learn −
from sklearn import datasets from sklearn import linear_model from sklearn.datasets import load_iris X, y = load_iris(return_X_y = True) LRG = linear_model.LogisticRegression( random_state = 0,solver = 'liblinear',multi class = 'auto' ) .fit(X, y) LRG.score(X, y)
Output
0.96
The output shows that the above Logistic Regression model gave the accuracy of 96 percent.