← retour aux snippets

shap: expected value (baseline)

Comprendre la valeur de base de SHAP et les contributions.

objectif

Comprendre la valeur de base de SHAP et les contributions.

code minimal

import shap
from xgboost import XGBRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

X, y = load_diabetes(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)

model = XGBRegressor(n_estimators=200, max_depth=3, learning_rate=0.1, subsample=0.9, colsample_bytree=0.9, tree_method="hist", random_state=0).fit(X_train, y_train)
expl = shap.TreeExplainer(model)
sv = expl.shap_values(X_val[:1])
base = expl.expected_value
pred = model.predict(X_val[:1])[0]
contrib = base + sv[0].sum()
print(abs(contrib - pred) < 1e-6)

utilisation

# Moyenne absolue des |SHAP| par feature
import numpy as np
imp = np.abs(expl.shap_values(X_val[:100])).mean(0)
print(imp.shape[0] == X.shape[1])

variante(s) utile(s)

# Visualisations: bar, waterfall
# shap.plots.bar(shap.Explanation(values=sv, base_values=base, data=X_val[:1]))
print(True)

notes

  • En régression, base ~ moyenne des prédictions sur dataset de référence; SHAP décompose prediction = base + somme(contributions).