objectif
Séparer entraînement/test avec stratification.
code minimal
from sklearn.model_selection import train_test_split
X = [[0],[1],[2],[3]]; y = [0,0,1,1]
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.5, stratify=y, random_state=0)
print(len(ytr) == len(yte) == 2)
utilisation
from sklearn.model_selection import StratifiedKFold
from sklearn.utils import check_random_state
skf = StratifiedKFold(n_splits=2, shuffle=True, random_state=0)
print(hasattr(skf, "split"))
variante(s) utile(s)
from sklearn.model_selection import TimeSeriesSplit
ts = TimeSeriesSplit(n_splits=3)
print(hasattr(ts, "split"))
notes
- Toujours fixer random_state pour reproductibilité.