How to conduct Grid search using python?

Python Server Side Programming Programming

Optimizing hyper parameters in machine learning models requires the use of grid search. Model performance is greatly influenced by hyper parameters like regularization strength or learning rate. With grid search, a preset set of hyper parameters is methodically investigated to identify the configuration that produces the best outcomes. Grid search offers an easy-to-use interface for building a grid of hyper parameters and evaluating model performance via cross-validation, both of which can be done using Python's Scikit-learn module. Grid search automates the search for ideal hyper parameters, allowing machine learning practitioners to concentrate on crucial activities like feature engineering and model selection. In this article, we'll go into detail about using Python to carry out a grid search.

Performing Grid Search CV using Python

We intended to use a grid search in this project to demonstrate the potential of the Scikit-learn package for Python. To begin, we created an example dataset for categorization using Scikit-learn. After dividing the dataset into training and testing sets, we once more utilized Scikit-learn to create an SVM model.

The SVM model is then put to the test with a grid search, which is equivalent to experimenting with various combinations of hyper parameters to discover which performs best. Scikit-learn excels in this area because it makes the procedure really simple. Using Scikit-learn's classification report, which provides us with a variety of important measures including accuracy, recall, and F1-score for each class in the dataset, we lastly assessed the model's performance.

Importing libraries & creating the dataset

Using Scikit-learn, an amazing bundle with a tonne of helpful machine-learning capabilities, we'll build our own dataset. The dataset is made using the following code −

from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=2, n_informative=2,
   n_redundant=0, n_clusters_per_class=1, random_state=42)

The dataset was created using rather straightforward programming. Just a few parameters need to be specified, including the sample size (in this case, 1000), the number of relevant features, which we'll set to 2, and the number of clusters per class, which we'll set to 1 to prevent overlapping clusters.

Splitting the Dataset

Once our dataset is ready, we need to separate it into training and testing sets. 80% of the data will be used to train our model, and the remaining 20% will be used to test it.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Creating a model

Next, we'll construct our model using the Support Vector Machine (SVM) method. SVM is a widely-liked option for classification jobs since it handles both linear and non-linear data effectively.

from sklearn.svm import SVC
model = SVC()

Performing Grid Search

In such a case, it's time to start the grid search. We'll test out several combinations of hyper parameters using Scikit-learn's Grid Search CV function to discover which ones perform best.

from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.1, 1, 10, 100],
   'gamma': [0.1, 1, 10, 100],
   'kernel': ['linear', 'rbf']}

grid = GridSearchCV(model, param_grid, refit=True, verbose=3)
grid.fit(X_train, y_train)

We create a grid of hyper parameters using a dictionary in the code and then provide it to the GridSearchCV function along with the model, the refit parameter set to True, and the verbose parameter set to 3.

Evaluating the model

The grid search is finished, and now it's time to evaluate how effectively our model worked. We'll utilize Scikit-learn's classification_report function for this. Using the best hyper parameters found by the grid search, this function creates a report on the model's performance on the testing set.

from sklearn.metrics import classification_report

y_pred = grid.predict(X_test)

print("Best Hyperparameters:", grid.best_params_)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Output

Best Hyperparameters: {'C': 10, 'gamma': 1, 'kernel': 'rbf'}

Classification Report:
              precision    recall  f1-score   support

           0       0.92      0.97      0.94       104
           1       0.97      0.91      0.94        96

    accuracy                           0.94       200
   macro avg       0.94      0.94      0.94       200
weighted avg       0.94      0.94      0.94       200

In the code, we create predictions for the testing set based on the ideal hyper parameters using the prediction method of the Grid Search CV object. The classification report and the most important hyper parameters are then printed.

Randomized Search CV method

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

param_dist = {'C': randint(1, 100),
   'gamma': randint(1, 100),
   'kernel': ['linear', 'rbf']}

random_search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=10, random_state=42)
random_search.fit(X_train, y_train)
y_pred_random = random_search.predict(X_test)

print("Best Hyperparameters (Grid Search):", grid.best_params_)
print("\nClassification Report (Grid Search):")
print(classification_report(y_test, y_pred))

print("\nBest Hyperparameters (Randomized Search):", random_search.best_params_)
print("\nClassification Report (Randomized Search):")
print(classification_report(y_test, y_pred_random))

Output

Best Hyper parameters (Grid Search): {'C': 10, 'gamma': 1, 'kernel': 'rbf'}

Classification Report (Grid Search):
              precision    recall  f1-score   support

           0       0.92      0.97      0.94       104
           1       0.97      0.91      0.94        96

    accuracy                           0.94       200
   macro avg       0.94      0.94      0.94       200
weighted avg       0.94      0.94      0.94       200


Best Hyperparameters (Randomized Search): {'C': 24, 'gamma': 3, 'kernel': 'rbf'}

Classification Report (Randomized Search):
              precision    recall  f1-score   support

           0       0.93      0.96      0.94       104
           1       0.96      0.92      0.94        96

    accuracy                           0.94       200
   macro avg       0.94      0.94      0.94       200
weighted avg       0.94      0.94      0.94       200

Conclusion

A grid search is the ultimate method for optimizing the hyper parameters of a machine-learning model. Scikit-learn and Python were used to demonstrate how to perform a grid search in this blog article. Along with that, we worked on a genuine project where we produced our own datasets, carried out extensive analysis of the data, and presented our conclusions.

Data scientists and machine learning aficionados may easily enhance the performance of their model's thanks to Python's Scikit-learn module's grid search feature. Grid search on your own datasets along with this blog post's recommendations can help your models perform better.

Jay Singh

Updated on: 31-Jul-2023

82 Views

Kickstart Your Career

Get certified by completing the course

Get Started