sklearn: score de Brier et calibration
objectif
Expliquer et montrer comment évaluer et calibrer les probabilités via Brier score.
code minimal
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import brier_score_loss
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=5000, n_features=20, random_state=0, weights=[0.8,0.2])
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.3, random_state=0)
rf = RandomForestClassifier(n_estimators=200, random_state=0).fit(Xtr, ytr)
p_raw = rf.predict_proba(Xte)[:,1]
b_raw = brier_score_loss(yte, p_raw)
cal = CalibratedClassifierCV(rf, method="isotonic", cv=3).fit(Xtr, ytr)
p_cal = cal.predict_proba(Xte)[:,1]
b_cal = brier_score_loss(yte, p_cal)
b_raw, b_cal
utilisation
print("brier raw vs cal:", b_raw, b_cal)
variante(s) utile(s)
# méthode 'sigmoid' (Platt scaling)
cal_sig = CalibratedClassifierCV(rf, method="sigmoid", cv=3).fit(Xtr, ytr)
print(cal_sig.predict_proba(Xte)[:3])
notes
- Brier plus petit = meilleures probabilités calibrées.
- Isotonic peut surajuster pour petits jeux; sigmoid est plus robuste.