with Categorical Features - Catboost Typing CST Test
Loading…
with Categorical Features — Catboost Code
Using CatBoost with categorical features in a classification task.
from catboost import CatBoostClassifier, Pool
import pandas as pd
# Sample data
data = pd.DataFrame({
'feature_num': [1,2,3,4,5,6],
'feature_cat': ['A','B','A','B','C','C'],
'label': [0,1,0,1,0,1]
})
X = data[['feature_num','feature_cat']]
y = data['label']
# Define categorical features
cat_features = ['feature_cat']
# Create Pool
data_pool = Pool(X, y, cat_features=cat_features)
# Define model
model = CatBoostClassifier(iterations=50, learning_rate=0.1, depth=3, verbose=0)
# Train model
model.fit(data_pool)
# Predict
y_pred = model.predict(X)
print('Predictions:', y_pred)Catboost Language Guide
CatBoost (Categorical Boosting) is an open-source gradient boosting library developed by Yandex, optimized for handling categorical features automatically and providing state-of-the-art performance for classification, regression, and ranking tasks.
Primary Use Cases
- ▸Binary and multiclass classification
- ▸Regression problems
- ▸Learning-to-rank tasks
- ▸Handling datasets with categorical features
- ▸Integration into machine learning pipelines for tabular data
Notable Features
- ▸Native support for categorical features
- ▸Ordered boosting to prevent overfitting
- ▸Supports GPU and CPU training
- ▸Efficient for large-scale datasets
- ▸Provides model interpretation tools
Origin & Creator
CatBoost was developed by Yandex in 2017 to provide a gradient boosting framework that efficiently handles categorical data while reducing prediction bias and overfitting.
Industrial Note
CatBoost is widely used in finance, recommendation systems, advertising, and other domains where tabular data contains categorical features and high predictive accuracy is needed.