objectif

Entraîner un classifieur XGBoost avec jeu de validation et arrêt anticipé.

code minimal

from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss
from xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0, stratify=y)

clf = XGBClassifier(
    n_estimators=2000,
    max_depth=4,
    learning_rate=0.05,
    subsample=0.9,
    colsample_bytree=0.9,
    reg_lambda=1.0,
    tree_method="hist",
    random_state=0,
)
clf.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    eval_metric="logloss",
    early_stopping_rounds=50,
    verbose=False,
)
y_pred_proba = clf.predict_proba(X_val)[:, 1]
print(round(log_loss(y_val, y_pred_proba), 4))

utilisation

# Les métriques de validation sont disponibles dans clf.evals_result_
hist = clf.evals_result_["validation_0"]["logloss"]
print(len(hist), hist[-1] <= hist[0])

variante(s) utile(s)

# Ajuster l'itération optimale pour réentraîner sur train+val
best_n = clf.best_iteration + 1

from xgboost import XGBClassifier
final = XGBClassifier(
    n_estimators=best_n,
    max_depth=clf.max_depth,
    learning_rate=clf.learning_rate,
    subsample=clf.subsample,
    colsample_bytree=clf.colsample_bytree,
    reg_lambda=clf.reg_lambda,
    tree_method="hist",
    random_state=0,
)
final.fit(X, y)
print(hasattr(final, "predict_proba"))

notes

Fixez random_state et utilisez early_stopping_rounds pour éviter le surapprentissage; récupérez best_iteration pour figer le modèle final.

Menu

xgboost: classifier avec early stopping

objectif

code minimal

utilisation

variante(s) utile(s)

notes