objectif
Comprendre la valeur de base de SHAP et les contributions.
code minimal
import shap
from xgboost import XGBRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
X, y = load_diabetes(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)
model = XGBRegressor(n_estimators=200, max_depth=3, learning_rate=0.1, subsample=0.9, colsample_bytree=0.9, tree_method="hist", random_state=0).fit(X_train, y_train)
expl = shap.TreeExplainer(model)
sv = expl.shap_values(X_val[:1])
base = expl.expected_value
pred = model.predict(X_val[:1])[0]
contrib = base + sv[0].sum()
print(abs(contrib - pred) < 1e-6)
utilisation
# Moyenne absolue des |SHAP| par feature
import numpy as np
imp = np.abs(expl.shap_values(X_val[:100])).mean(0)
print(imp.shape[0] == X.shape[1])
variante(s) utile(s)
# Visualisations: bar, waterfall
# shap.plots.bar(shap.Explanation(values=sv, base_values=base, data=X_val[:1]))
print(True)
notes
- En régression, base ~ moyenne des prédictions sur dataset de référence; SHAP décompose prediction = base + somme(contributions).