objectif
Encoder des catégories en variables indicatrices.
code minimal
import pandas as pd
df = pd.DataFrame({"color":["red","blue","red"]})
enc = pd.get_dummies(df, columns=["color"], drop_first=True)
print(sorted(enc.columns))
utilisation
import pandas as pd
s = pd.Series(["a","b","a"])
print(pd.get_dummies(s).shape[1])
variante(s) utile(s)
import pandas as pd
df = pd.DataFrame({"city":["paris","?"]})
print(pd.get_dummies(df, dummy_na=True).filter(like="NaN").shape[1] >= 0)
notes
drop_first=Truepour éviter colinéarité en régression.