How to calculate the prediction accuracy of logistic regression?


Logistic regression is a statistical approach for examining the connection between a dependent variable and one or more independent variables. It is a form of regression analysis frequently used for classification tasks when the dependent variable is binary (i.e., takes only two values). Finding the link between the independent factors and the likelihood that the dependent variable will take on a certain value is the aim of logistic regression.

Since it enables us to predict the likelihood of an event occurring based on the values of the independent variables, logistic regression is a crucial tool in data analysis and machine learning. It is commonly utilized in industries where prognosticating results is essential, including healthcare, finance, and marketing.

The accuracy of a logistic regression model's predictions of outcomes is a crucial metric of the model's performance. The accuracy score displays what proportion of all forecasts were correct in relation to the total number of guesses. A model is providing more accurate forecasts when its accuracy rating is greater; conversely, a model is producing more inaccurate predictions when its accuracy rating is lower. In this post, we'll look at how to assess the prediction accuracy of logistic regression.

Calculating Prediction Accuracy of Logistic Regression

Here is an example Python program that uses the scikit-learn module to determine the logistic regression's prediction accuracy using data from a real dataset −

To calculate the prediction accuracy of logistic regression, here are the steps we will follow −

  • First, we will import all the necessary modules from sklearn.

  • Then we will load the dataset.

  • Splitting the data into training and testing sets.

  • Then, we will be creating a logistic regression model.

  • At last, we will predict the accuracy of the test set.

In this example, we first use the scikit-learn load breast cancer method to load the breast cancer dataset. Thereafter, we used the train test split function to divide the dataset into training and testing sets. The next step is to use the LogisticRegression class to generate a logistic regression model, which is then fitted to the training set of data using the fit method. The prediction accuracy is then determined by using the scikit-learn accuracy score function to the testing data and leveraging the prediction method to create predictions. Lastly, we output the console with prediction accuracy.

Example

# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load the breast cancer dataset
data = load_breast_cancer()

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3, random_state=42)

# Create a logistic regression model
lr = LogisticRegression()

# Fit the model on the training data
lr.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = lr.predict(X_test)

# Calculate the prediction accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print the prediction accuracy
print("Prediction Accuracy:", accuracy)

Output

Prediction Accuracy: 0.9707602339181286

Conclusion

In conclusion, prediction accuracy is a key factor in determining how well a logistic regression model performs. The accuracy score indicates what portion of the predictions the model produced was correct. A higher accuracy number indicates more accurate predictions from the model, whilst a lower score indicates less accurate predictions from the model.

Updated on: 25-Apr-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements