Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to plot with xgboost.XGBCClassifier.feature_importances_ model? (Matplotlib)
The XGBClassifier from XGBoost provides feature importance scores through the feature_importances_ attribute. We can visualize these importance scores using Matplotlib to understand which features contribute most to the model's predictions.
Understanding Feature Importances
Feature importance in XGBoost represents how useful each feature is for making accurate predictions. Higher values indicate more important features in the decision-making process.
Basic Feature Importance Plot
Here's how to create a feature importance plot using synthetic data ?
import numpy as np
from xgboost import XGBClassifier
import matplotlib.pyplot as plt
# Create synthetic dataset
np.random.seed(42)
X = np.random.rand(1000, 8)
y = (X[:, 0] + X[:, 1] > 1).astype(int) # Simple classification rule
# Train XGBoost model
model = XGBClassifier(random_state=42, eval_metric='logloss')
model.fit(X, y)
# Get feature importances
importances = model.feature_importances_
feature_names = [f'Feature_{i}' for i in range(len(importances))]
# Create bar plot
plt.figure(figsize=(10, 6))
plt.bar(range(len(importances)), importances)
plt.xlabel('Features')
plt.ylabel('Importance Score')
plt.title('XGBoost Feature Importances')
plt.xticks(range(len(importances)), feature_names, rotation=45)
plt.tight_layout()
plt.show()
print("Feature Importances:", importances)
Feature Importances: [0.49586776 0.4958678 0.00413223 0.00413221 0. 0. 0. 0. ]
Enhanced Visualization with Feature Names
For better readability, we can sort features by importance and add proper labels ?
import numpy as np
from xgboost import XGBClassifier
import matplotlib.pyplot as plt
# Create synthetic dataset
np.random.seed(42)
X = np.random.rand(500, 6)
y = (X[:, 0] * 2 + X[:, 2] * 0.5 > 1).astype(int)
# Train model
model = XGBClassifier(random_state=42, eval_metric='logloss')
model.fit(X, y)
# Get importances and feature names
importances = model.feature_importances_
features = ['Age', 'Income', 'Score', 'Experience', 'Rating', 'Hours']
# Sort by importance
indices = np.argsort(importances)[::-1]
sorted_features = [features[i] for i in indices]
sorted_importances = importances[indices]
# Create horizontal bar plot
plt.figure(figsize=(10, 6))
colors = plt.cm.viridis(np.linspace(0, 1, len(sorted_importances)))
bars = plt.barh(sorted_features, sorted_importances, color=colors)
plt.xlabel('Importance Score')
plt.title('XGBoost Feature Importances (Sorted)')
plt.gca().invert_yaxis() # Highest importance at top
# Add value labels on bars
for bar, importance in zip(bars, sorted_importances):
plt.text(bar.get_width() + 0.01, bar.get_y() + bar.get_height()/2,
f'{importance:.3f}', va='center')
plt.tight_layout()
plt.show()
(Horizontal bar chart showing sorted feature importances)
Comparison Table
| Plot Type | Best For | Advantages |
|---|---|---|
| Vertical Bar | Few features (<10) | Compact, easy comparison |
| Horizontal Bar | Many features or long names | Better label readability |
| Sorted Plot | Identifying top features | Clear ranking visualization |
Key Parameters
Important considerations when plotting feature importances:
-
figure.figsize− Controls plot dimensions -
rotation− Rotates x-axis labels for readability -
tight_layout()− Prevents label cutoff -
eval_metric− Suppresses XGBoost warnings
Conclusion
XGBoost feature importances help identify which features contribute most to model predictions. Use horizontal bar plots for better readability with many features, and always sort by importance to highlight the most influential variables.
