objectif
Utiliser LGBMClassifier avec colonnes catégorielles et early stopping.
code minimal
import pandas as pd
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier
df = pd.DataFrame({
"cat": pd.Series(["a","b","a","c"], dtype="category"),
"num": [1.0, 2.0, 3.0, 4.0],
"y": [0, 1, 0, 1],
})
X = df[["cat","num"]]
y = df["y"]
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.5, random_state=0, stratify=y)
clf = LGBMClassifier(n_estimators=1000, learning_rate=0.1, random_state=0)
clf.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
eval_metric="logloss",
categorical_feature=["cat"],
callbacks=[],
)
print(hasattr(clf, "predict_proba"))
utilisation
# LightGBM peut déduire automatiquement les colonnes "category"
X2 = X.copy()
X2["cat"] = X2["cat"].astype("category")
clf2 = LGBMClassifier(n_estimators=200, random_state=0)
clf2.fit(X2, y)
print(clf2.classes_.tolist())
variante(s) utile(s)
# Feature importances
imp = clf.booster_.feature_importance(importance_type="gain")
print(len(imp) == X.shape[1])
notes
- Avec dtype ‘category’, LightGBM gère les catégories nativement et efficacement; combinez eval_set + early_stopping via callbacks.LightGBM.