CatBoost - Classifier

Quiz

The CatBoost Classifier is a useful tool for handling classification problems, particularly when working with data that includes categorical variables. Gradient boosting is its core method, combines multiple weak models into a single, powerful model. One of CatBoost's primary features is its ability to handle categorical data without requiring their conversion into numerical values.

Now we will go through how to use CatBoost Classifier. We will take an example where we want to use certain data to predict something.

Steps to use CatBoostClassifier

Now let us walk through the steps of using CatBoost Classifier −

1. Prepare Your Data

Data: A collection of data must be obtained in order to train the model. Suppose you have a list of people, together with their age, gender, and salary. You want to predict whether or not every person would buy a product. Here −

Features (X): The particulars, like age, gender, and price.
Labels (y): The response, including if they purchased the item or not.
Training Data: It is the data the model will learn from.
Testing Data: It is the data that you will use to see if the model is working well.

Example

Here is the example of how you can prepare data −

X_train, X_test, y_train, y_test = train_test_split (data, labels, test_size=0.2)

2. Build the CatBoost Classifier

Next, the CatBoost model is generated. You can change a number of parameters, like −

iterations: How many little models, or trees, it will generate. Every tree learns from the errors made by its previous ones.
learning_rate: The rate of learning of the model. If it is too fast, the model might miss important details. If it is too slow so it will take a long time to learn.
depth: It is tree's overall depth. Increasing complex features can be captured by the model with increasing depth, but overfitting could become more probable.

Example

Here is the example of how you can build catboost classifier −

model = CatBoostClassifier (iterations=1000, learning_rate=0.1, depth=6)

3. Train the Model

Once the data and model are available, the model needs to be trained. It indicates that the model will look through the training data and try to find any patterns. Here fit command tells the model to learn from the training data (X_train and y_train). And verbose=100 means the model will show its progress every 100 iterations so you can see how it is learning.

model.fit (X_train, y_train, verbose=100)

4. Make Predictions

After the model has been trainedyou can use it to make predictions. For example- it will predict the result (y_test) when you provide it new data (X_test).

preds = model.predict(X_test)

5. Evaluate the Model

After making your predictions, you have to evaluate the model's performance. One way to do this is by checking how accurate the model's predictions are.

accuracy = accuracy_score(y_test, preds)
print(f"Accuracy is: {accuracy}")

Example to use CatBoostClassifier

We will use a dataset of Housing.csv to show how to create a CatBoost Classifier model. To create the model we will follow the above steps −

import pandas as pd
from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# 1. Load the dataset
data = pd.read_csv('/Python/Housing.csv')

# Step 3: Preprocess the Data
# Convert categorical variables to numeric
categorical_features = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'airconditioning', 'prefarea', 'furnishingstatus']
for col in categorical_features:
    data[col] = data[col].map({'yes': 1, 'no': 0})

# Target variable
X = data.drop('price', axis=1)
y = data['price']

# Step 4: Split the Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Train the Model
model = CatBoostClassifier(iterations=100, depth=6, learning_rate=0.1, random_seed=42, verbose=0)
model.fit(X_train, y_train)

# Step 6: Evaluate the Model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:")
print(report)

Output

Here is the outcome −

Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

     1750000       0.00      0.00      0.00       1.0
     1820000       0.00      0.00      0.00       1.0
     1890000       0.00      0.00      0.00       2.0
     2100000       0.00      0.00      0.00       1.0
     2233000       0.00      0.00      0.00       1.0
     . 
     . 
     .
     9800000       0.00      0.00      0.00       2.0
    10150000       0.00      0.00      0.00       1.0
    12250000       0.00      0.00      0.00       1.0
    13300000       0.00      0.00      0.00       1.0

    accuracy                           0.00     109.0
   macro avg       0.00      0.00      0.00     109.0
weighted avg       0.00      0.00      0.00     109.0

Print Page