← retour aux snippets

xgboost: importance (gain) et SHAP

Comparer importance par gain et par valeurs SHAP.

python explain #xgboost#shap#importance

objectif

Comparer importance par gain et par valeurs SHAP.

code minimal

import shap, xgboost as xgb
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
dtrain = xgb.DMatrix(X, label=y, feature_names=[f"f{i}" for i in range(X.shape[1])])
bst = xgb.train({"objective":"binary:logistic", "tree_method":"hist", "seed":0}, dtrain, num_boost_round=200)
gain_imp = bst.get_score(importance_type="gain")
print(len(gain_imp) > 0)

utilisation

# SHAP via TreeExplainer
expl = shap.TreeExplainer(bst)
sv = expl.shap_values(X[:100])
print(len(sv) == 100)

variante(s) utile(s)

# Importance SHAP (moyenne des |shap|)
import numpy as np
shap_imp = np.abs(sv).mean(0)
print(shap_imp.shape[0] == X.shape[1])

notes

  • Le ‘gain’ reflète l’utilisation en arbre; SHAP reflète la contribution locale moyenne.