
- CatBoost - Home
- CatBoost - Overview
- CatBoost - Architecture
- CatBoost - Installation
- CatBoost - Features
- CatBoost - Decision Trees
- CatBoost - Boosting Process
- CatBoost - Core Parameters
- CatBoost - Data Preprocessing
- CatBoost - Handling Categorical Features
- CatBoost - Handling Missing Values
- CatBoost - Classifier
- CatBoost - Model Training
- CatBoost - Metrics for Model Evaluation
- CatBoost - Classification Metrics
- CatBoost - Over-fitting Detection
- CatBoost vs Other Boosting Algorithms
- CatBoost Useful Resources
- CatBoost - Quick Guide
- CatBoost - Useful Resources
- CatBoost - Discussion
CatBoost - Classifier
The CatBoost Classifier is a useful tool for handling classification problems, particularly when working with data that includes categorical variables. Gradient boosting is its core method, combines multiple weak models into a single, powerful model. One of CatBoost's primary features is its ability to handle categorical data without requiring their conversion into numerical values.
Now we will go through how to use CatBoost Classifier. We will take an example where we want to use certain data to predict something.
Steps to use CatBoostClassifier
Now let us walk through the steps of using CatBoost Classifier −
1. Prepare Your Data
Data: A collection of data must be obtained in order to train the model. Suppose you have a list of people, together with their age, gender, and salary. You want to predict whether or not every person would buy a product. Here −
Features (X): The particulars, like age, gender, and price.
Labels (y): The response, including if they purchased the item or not.
Training Data: It is the data the model will learn from.
Testing Data: It is the data that you will use to see if the model is working well.
Example
Here is the example of how you can prepare data −
X_train, X_test, y_train, y_test = train_test_split (data, labels, test_size=0.2)
2. Build the CatBoost Classifier
Next, the CatBoost model is generated. You can change a number of parameters, like −
iterations: How many little models, or trees, it will generate. Every tree learns from the errors made by its previous ones.
learning_rate: The rate of learning of the model. If it is too fast, the model might miss important details. If it is too slow so it will take a long time to learn.
depth: It is tree's overall depth. Increasing complex features can be captured by the model with increasing depth, but overfitting could become more probable.
Example
Here is the example of how you can build catboost classifier −
model = CatBoostClassifier (iterations=1000, learning_rate=0.1, depth=6)
3. Train the Model
Once the data and model are available, the model needs to be trained. It indicates that the model will look through the training data and try to find any patterns. Here fit command tells the model to learn from the training data (X_train and y_train). And verbose=100 means the model will show its progress every 100 iterations so you can see how it is learning.
model.fit (X_train, y_train, verbose=100)
4. Make Predictions
After the model has been trainedyou can use it to make predictions. For example- it will predict the result (y_test) when you provide it new data (X_test).
preds = model.predict(X_test)
5. Evaluate the Model
After making your predictions, you have to evaluate the model's performance. One way to do this is by checking how accurate the model's predictions are.
accuracy = accuracy_score(y_test, preds) print(f"Accuracy is: {accuracy}")
Example to use CatBoostClassifier
We will use a dataset of Housing.csv to show how to create a CatBoost Classifier model. To create the model we will follow the above steps −
import pandas as pd from catboost import CatBoostClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, classification_report # 1. Load the dataset data = pd.read_csv('/Python/Housing.csv') # Step 3: Preprocess the Data # Convert categorical variables to numeric categorical_features = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'airconditioning', 'prefarea', 'furnishingstatus'] for col in categorical_features: data[col] = data[col].map({'yes': 1, 'no': 0}) # Target variable X = data.drop('price', axis=1) y = data['price'] # Step 4: Split the Data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Step 5: Train the Model model = CatBoostClassifier(iterations=100, depth=6, learning_rate=0.1, random_seed=42, verbose=0) model.fit(X_train, y_train) # Step 6: Evaluate the Model y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) report = classification_report(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}") print("Classification Report:") print(report)
Output
Here is the outcome −
Accuracy: 0.00 Classification Report: precision recall f1-score support 1750000 0.00 0.00 0.00 1.0 1820000 0.00 0.00 0.00 1.0 1890000 0.00 0.00 0.00 2.0 2100000 0.00 0.00 0.00 1.0 2233000 0.00 0.00 0.00 1.0 . . . 9800000 0.00 0.00 0.00 2.0 10150000 0.00 0.00 0.00 1.0 12250000 0.00 0.00 0.00 1.0 13300000 0.00 0.00 0.00 1.0 accuracy 0.00 109.0 macro avg 0.00 0.00 0.00 109.0 weighted avg 0.00 0.00 0.00 109.0