Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to calculate the prediction accuracy of logistic regression?
Logistic regression is a statistical approach for examining the connection between a dependent variable and one or more independent variables. It is a form of regression analysis frequently used for classification tasks when the dependent variable is binary (i.e., takes only two values). Finding the link between the independent factors and the likelihood that the dependent variable will take on a certain value is the aim of logistic regression.
Since it enables us to predict the likelihood of an event occurring based on the values of the independent variables, logistic regression is a crucial tool in data analysis and machine learning. It is commonly utilized in industries where prognosticating results is essential, including healthcare, finance, and marketing.
The accuracy of a logistic regression model's predictions is a crucial metric for evaluating model performance. The accuracy score shows what proportion of all predictions were correct. A higher accuracy score indicates better model performance. In this article, we'll explore how to calculate the prediction accuracy of logistic regression using Python.
Steps to Calculate Prediction Accuracy
To calculate the prediction accuracy of logistic regression, we follow these key steps ?
Import necessary modules from scikit-learn
Load the dataset for training and testing
Split the data into training and testing sets
Create and train a logistic regression model
Make predictions on the test set
Calculate accuracy using the accuracy score
Example: Breast Cancer Dataset
Here's a complete example using the breast cancer dataset from scikit-learn ?
# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load the breast cancer dataset
data = load_breast_cancer()
# Split the dataset into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.3, random_state=42
)
# Create a logistic regression model
lr = LogisticRegression(max_iter=1000)
# Fit the model on the training data
lr.fit(X_train, y_train)
# Make predictions on the testing data
y_pred = lr.predict(X_test)
# Calculate the prediction accuracy
accuracy = accuracy_score(y_test, y_pred)
# Print the prediction accuracy
print(f"Prediction Accuracy: {accuracy:.4f}")
print(f"Accuracy Percentage: {accuracy * 100:.2f}%")
Prediction Accuracy: 0.9708 Accuracy Percentage: 97.08%
Understanding the Results
In this example, we achieved an accuracy of approximately 97.08%, which means our logistic regression model correctly predicted the class (malignant or benign) for about 97 out of 100 test samples. This indicates excellent model performance on this dataset.
Alternative Accuracy Calculation
You can also calculate accuracy manually by comparing predictions with actual values ?
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# Load and split data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.3, random_state=42
)
# Train model and make predictions
lr = LogisticRegression(max_iter=1000)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
# Manual accuracy calculation
correct_predictions = sum(y_pred == y_test)
total_predictions = len(y_test)
manual_accuracy = correct_predictions / total_predictions
print(f"Correct predictions: {correct_predictions}")
print(f"Total predictions: {total_predictions}")
print(f"Manual accuracy: {manual_accuracy:.4f}")
Correct predictions: 166 Total predictions: 171 Manual accuracy: 0.9708
Conclusion
Prediction accuracy is a fundamental metric for evaluating logistic regression models. Use accuracy_score() from scikit-learn for quick calculations, or compute it manually by comparing correct predictions to total predictions. Higher accuracy scores indicate better model performance for classification tasks.
