How to conduct Grid search using python?

Grid search is a systematic approach to hyperparameter tuning in machine learning. It evaluates all possible combinations of specified hyperparameters to find the optimal configuration. Python's Scikit-learn provides powerful tools like GridSearchCV and RandomizedSearchCV to automate this process with cross-validation.

Understanding Grid Search

Grid search works by defining a parameter grid containing different values for each hyperparameter. The algorithm trains and evaluates the model for every combination, selecting the configuration that yields the best cross-validation score.

Complete Grid Search Example

Creating the Dataset

First, let's create a synthetic dataset using Scikit-learn ?

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

# Create synthetic dataset
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2,
                         n_redundant=0, n_clusters_per_class=1, random_state=42)

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")
Training set size: 800
Test set size: 200

Performing Grid Search

Now we'll create an SVM model and define a parameter grid to search through ?

# Create SVM model
model = SVC()

# Define parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.1, 1, 10, 100],
    'kernel': ['linear', 'rbf']
}

# Perform grid search with 5-fold cross-validation
grid = GridSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid.fit(X_train, y_train)

# Make predictions using best parameters
y_pred = grid.predict(X_test)

print("Best Hyperparameters:", grid.best_params_)
print("Best Cross-validation Score:", round(grid.best_score_, 4))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Best Hyperparameters: {'C': 10, 'gamma': 1, 'kernel': 'rbf'}
Best Cross-validation Score: 0.94

Classification Report:
              precision    recall  f1-score   support

           0       0.92      0.97      0.94       104
           1       0.97      0.91      0.94        96

    accuracy                           0.94       200
   macro avg       0.94      0.94      0.94       200
weighted avg       0.94      0.94      0.94       200

Randomized Search Alternative

For large parameter spaces, RandomizedSearchCV is more efficient as it samples random combinations rather than testing all possibilities ?

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

# Define parameter distributions for random search
param_dist = {
    'C': uniform(0.1, 100),
    'gamma': uniform(0.1, 100), 
    'kernel': ['linear', 'rbf']
}

# Perform randomized search
random_search = RandomizedSearchCV(model, param_distributions=param_dist, 
                                 n_iter=20, cv=5, random_state=42)
random_search.fit(X_train, y_train)
y_pred_random = random_search.predict(X_test)

print("Randomized Search Best Parameters:", random_search.best_params_)
print("Randomized Search Best Score:", round(random_search.best_score_, 4))
print("\nTest Accuracy:", round(random_search.score(X_test, y_test), 4))
Randomized Search Best Parameters: {'C': 24.56, 'gamma': 3.21, 'kernel': 'rbf'}
Randomized Search Best Score: 0.9375
Test Accuracy: 0.94

Comparison Table

Method Search Strategy Time Complexity Best For
Grid Search Exhaustive Higher Small parameter spaces
Randomized Search Random sampling Lower Large parameter spaces

Key Parameters

  • cv ? Number of cross-validation folds (default: 5)
  • scoring ? Metric to optimize ('accuracy', 'f1', 'roc_auc', etc.)
  • n_jobs ? Number of parallel jobs (-1 uses all processors)
  • refit ? Whether to refit the model with best parameters (default: True)

Conclusion

Grid search systematically finds optimal hyperparameters through exhaustive testing, while randomized search offers faster alternatives for large parameter spaces. Use GridSearchCV for thorough exploration and RandomizedSearchCV when computational efficiency is crucial.

Updated on: 2026-03-27T10:39:19+05:30

320 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements