← retour aux snippets

lightgbm: classifier et features catégorielles

Utiliser LGBMClassifier avec colonnes catégorielles et early stopping.

objectif

Utiliser LGBMClassifier avec colonnes catégorielles et early stopping.

code minimal

import pandas as pd
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier

df = pd.DataFrame({
    "cat": pd.Series(["a","b","a","c"], dtype="category"),
    "num": [1.0, 2.0, 3.0, 4.0],
    "y":   [0, 1, 0, 1],
})
X = df[["cat","num"]]
y = df["y"]
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.5, random_state=0, stratify=y)

clf = LGBMClassifier(n_estimators=1000, learning_rate=0.1, random_state=0)
clf.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    eval_metric="logloss",
    categorical_feature=["cat"],
    callbacks=[],
)
print(hasattr(clf, "predict_proba"))

utilisation

# LightGBM peut déduire automatiquement les colonnes "category"
X2 = X.copy()
X2["cat"] = X2["cat"].astype("category")
clf2 = LGBMClassifier(n_estimators=200, random_state=0)
clf2.fit(X2, y)
print(clf2.classes_.tolist())

variante(s) utile(s)

# Feature importances
imp = clf.booster_.feature_importance(importance_type="gain")
print(len(imp) == X.shape[1])

notes

  • Avec dtype ‘category’, LightGBM gère les catégories nativement et efficacement; combinez eval_set + early_stopping via callbacks.LightGBM.