Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to conduct Grid search using python?
Grid search is a systematic approach to hyperparameter tuning in machine learning. It evaluates all possible combinations of specified hyperparameters to find the optimal configuration. Python's Scikit-learn provides powerful tools like GridSearchCV and RandomizedSearchCV to automate this process with cross-validation.
Understanding Grid Search
Grid search works by defining a parameter grid containing different values for each hyperparameter. The algorithm trains and evaluates the model for every combination, selecting the configuration that yields the best cross-validation score.
Complete Grid Search Example
Creating the Dataset
First, let's create a synthetic dataset using Scikit-learn ?
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
# Create synthetic dataset
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2,
n_redundant=0, n_clusters_per_class=1, random_state=42)
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")
Training set size: 800 Test set size: 200
Performing Grid Search
Now we'll create an SVM model and define a parameter grid to search through ?
# Create SVM model
model = SVC()
# Define parameter grid
param_grid = {
'C': [0.1, 1, 10, 100],
'gamma': [0.1, 1, 10, 100],
'kernel': ['linear', 'rbf']
}
# Perform grid search with 5-fold cross-validation
grid = GridSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid.fit(X_train, y_train)
# Make predictions using best parameters
y_pred = grid.predict(X_test)
print("Best Hyperparameters:", grid.best_params_)
print("Best Cross-validation Score:", round(grid.best_score_, 4))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Best Hyperparameters: {'C': 10, 'gamma': 1, 'kernel': 'rbf'}
Best Cross-validation Score: 0.94
Classification Report:
precision recall f1-score support
0 0.92 0.97 0.94 104
1 0.97 0.91 0.94 96
accuracy 0.94 200
macro avg 0.94 0.94 0.94 200
weighted avg 0.94 0.94 0.94 200
Randomized Search Alternative
For large parameter spaces, RandomizedSearchCV is more efficient as it samples random combinations rather than testing all possibilities ?
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
# Define parameter distributions for random search
param_dist = {
'C': uniform(0.1, 100),
'gamma': uniform(0.1, 100),
'kernel': ['linear', 'rbf']
}
# Perform randomized search
random_search = RandomizedSearchCV(model, param_distributions=param_dist,
n_iter=20, cv=5, random_state=42)
random_search.fit(X_train, y_train)
y_pred_random = random_search.predict(X_test)
print("Randomized Search Best Parameters:", random_search.best_params_)
print("Randomized Search Best Score:", round(random_search.best_score_, 4))
print("\nTest Accuracy:", round(random_search.score(X_test, y_test), 4))
Randomized Search Best Parameters: {'C': 24.56, 'gamma': 3.21, 'kernel': 'rbf'}
Randomized Search Best Score: 0.9375
Test Accuracy: 0.94
Comparison Table
| Method | Search Strategy | Time Complexity | Best For |
|---|---|---|---|
| Grid Search | Exhaustive | Higher | Small parameter spaces |
| Randomized Search | Random sampling | Lower | Large parameter spaces |
Key Parameters
- cv ? Number of cross-validation folds (default: 5)
- scoring ? Metric to optimize ('accuracy', 'f1', 'roc_auc', etc.)
- n_jobs ? Number of parallel jobs (-1 uses all processors)
- refit ? Whether to refit the model with best parameters (default: True)
Conclusion
Grid search systematically finds optimal hyperparameters through exhaustive testing, while randomized search offers faster alternatives for large parameter spaces. Use GridSearchCV for thorough exploration and RandomizedSearchCV when computational efficiency is crucial.
