objectif
Détecter et supprimer des doublons.
code minimal
import pandas as pd
s = pd.Series([1,1,2,3])
print(s.duplicated().tolist()[1])
utilisation
import pandas as pd
df = pd.DataFrame({"a":[1,1,2],"b":[0,0,0]})
print(df.drop_duplicates().shape[0])
variante(s) utile(s)
import pandas as pd
df = pd.DataFrame({"a":[1,1,2],"t":[1,2,3]})
print(df.sort_values("t").drop_duplicates("a", keep="last")["t"].tolist())
notes
- drop_duplicates(subset=cols, keep=‘first’|‘last’|False).