How to Use Python for Ensemble Learning?

Ensemble learning is a machine learning technique that combines predictions from multiple models to improve overall performance. By training multiple models independently and combining their predictions, ensemble methods can reduce overfitting and improve generalization compared to single models.

This approach has proven successful in applications like image classification, speech recognition, and natural language processing. In this tutorial, we'll explore four ensemble learning methods: bagging, boosting, stacking, and voting with Python implementations.

Bagging

Bagging (Bootstrap Aggregating) trains multiple models on different subsets of data and averages their predictions. Random Forest is a popular bagging algorithm that trains decision trees on data subsets.

Example

Here's how to implement bagging using Random Forest on the Iris dataset ?

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.metrics import classification_report

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create base model and bagging ensemble
base_model = RandomForestClassifier(random_state=42)
bagging_ensemble = BaggingClassifier(estimator=base_model, n_estimators=10, random_state=42)

# Train and predict
bagging_ensemble.fit(X_train, y_train)
y_pred = bagging_ensemble.predict(X_test)

# Evaluate performance
print(classification_report(y_test, y_pred, target_names=iris.target_names))
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      0.94      0.97        16
   virginica       0.93      1.00      0.97        10

    accuracy                           0.98        45
   macro avg       0.98      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45

Boosting

Boosting trains models sequentially, where each model attempts to correct the errors of previous models. AdaBoost is a popular boosting algorithm that adjusts weights based on misclassified examples.

Example

Here's boosting implementation using AdaBoost ?

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

# Load dataset and split
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create base model and boosting ensemble
base_model = DecisionTreeClassifier(max_depth=1, random_state=42)
boosting_ensemble = AdaBoostClassifier(estimator=base_model, n_estimators=50, random_state=42)

# Train and predict
boosting_ensemble.fit(X_train, y_train)
y_pred = boosting_ensemble.predict(X_test)

# Evaluate
print(classification_report(y_test, y_pred, target_names=iris.target_names))
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       0.94      1.00      0.97        16
   virginica       1.00      0.90      0.95        10

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45

Stacking

Stacking trains multiple base models and uses their predictions as input to a meta-learner. This creates a two-level learning architecture where the meta-model learns how to best combine base model predictions.

Example

Here's stacking implementation using scikit-learn's StackingClassifier ?

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create base models
base_models = [
    ('lr', LogisticRegression(random_state=42, max_iter=200)),
    ('svm', SVC(random_state=42, probability=True)),
    ('rf', RandomForestClassifier(random_state=42))
]

# Create meta-learner
meta_learner = LogisticRegression(random_state=42)

# Create stacking ensemble
stacking_ensemble = StackingClassifier(estimators=base_models, final_estimator=meta_learner, cv=5)

# Train and predict
stacking_ensemble.fit(X_train, y_train)
y_pred = stacking_ensemble.predict(X_test)

# Evaluate
print(classification_report(y_test, y_pred, target_names=iris.target_names))
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      1.00      1.00        16
   virginica       1.00      1.00      1.00        10

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Voting

Voting combines predictions from multiple models using either hard voting (majority vote) or soft voting (average of predicted probabilities). It's the simplest ensemble method.

Example

Here's voting ensemble implementation ?

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create base models
models = [
    ('lr', LogisticRegression(random_state=42, max_iter=200)),
    ('svm', SVC(random_state=42, probability=True)),
    ('rf', RandomForestClassifier(random_state=42))
]

# Create voting ensemble (soft voting)
voting_ensemble = VotingClassifier(estimators=models, voting='soft')

# Train and predict
voting_ensemble.fit(X_train, y_train)
y_pred = voting_ensemble.predict(X_test)

# Evaluate
print(classification_report(y_test, y_pred, target_names=iris.target_names))
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      1.00      1.00        16
   virginica       1.00      1.00      1.00        10

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Comparison of Ensemble Methods

Method Training Combination Best For
Bagging Parallel Average/Vote Reducing variance
Boosting Sequential Weighted combination Reducing bias
Stacking Two-level Meta-learner Complex patterns
Voting Independent Vote/Average Simple combination

Conclusion

Ensemble learning significantly improves model performance by combining multiple algorithms. Choose bagging for variance reduction, boosting for bias reduction, stacking for complex relationships, and voting for simple model combination.

Updated on: 2026-03-27T16:37:09+05:30

481 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements