objectif
Imputer des valeurs manquantes proprement.
code minimal
from sklearn.impute import SimpleImputer
import numpy as np
imp = SimpleImputer(strategy="mean").fit([[1],[np.nan],[3]])
print(float(imp.transform([[np.nan]])[0,0]))
utilisation
from sklearn.impute import SimpleImputer
imp = SimpleImputer(strategy="most_frequent").fit([["a"],["a"],["b"]])
print(imp.transform([[None]])[0,0])
variante(s) utile(s)
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
pipe = Pipeline([("imp", SimpleImputer()), ("sc", StandardScaler()), ("lr", LinearRegression())])
print(hasattr(pipe, "fit"))
notes
- Imputer uniquement sur le train dans une Pipeline.