with Categorical Features - Lightgbm Typing CST Test
Loading…
with Categorical Features — Lightgbm Code
Classification using LightGBM with categorical features.
import lightgbm as lgb
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Sample data
data = pd.DataFrame({'feature1':[1,2,3,4,5], 'feature2':['A','B','A','B','C'], 'label':[0,1,0,1,0]})
data['feature2'] = data['feature2'].astype('category')
X = data[['feature1','feature2']]
y = data['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_data = lgb.Dataset(X_train, label=y_train, categorical_feature=['feature2'])
params = {'objective':'binary','metric':'binary_logloss'}
model = lgb.train(params, train_data, num_boost_round=50)
y_pred = model.predict(X_test)
y_pred_labels = (y_pred > 0.5).astype(int)
print('Accuracy:', accuracy_score(y_test, y_pred_labels))Lightgbm Language Guide
LightGBM (Light Gradient Boosting Machine) is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithms, used for ranking, classification, and many other machine learning tasks.
Primary Use Cases
- ▸Binary and multiclass classification
- ▸Regression problems
- ▸Ranking tasks (learning-to-rank)
- ▸Feature selection and importance analysis
- ▸Integration in ML pipelines for large-scale structured data
Notable Features
- ▸Faster training with histogram-based decision tree algorithm
- ▸Low memory usage compared to XGBoost
- ▸Supports parallel and GPU learning
- ▸Handles categorical features directly
- ▸Scales efficiently with large datasets
Origin & Creator
LightGBM was developed by Microsoft’s DMTK team and released in 2016 to provide a faster and more memory-efficient gradient boosting framework compared to existing solutions.
Industrial Note
LightGBM is widely used in Kaggle competitions, finance, advertising, recommendation systems, and any scenario requiring high-speed gradient boosting on large datasets.