← retour aux snippets

pandas: duplicated et drop_duplicates

Détecter et supprimer des doublons.

python pandas #pandas#duplicates#clean

objectif

Détecter et supprimer des doublons.

code minimal

import pandas as pd
s = pd.Series([1,1,2,3])
print(s.duplicated().tolist()[1])

utilisation

import pandas as pd
df = pd.DataFrame({"a":[1,1,2],"b":[0,0,0]})
print(df.drop_duplicates().shape[0])

variante(s) utile(s)

import pandas as pd
df = pd.DataFrame({"a":[1,1,2],"t":[1,2,3]})
print(df.sort_values("t").drop_duplicates("a", keep="last")["t"].tolist())

notes

  • drop_duplicates(subset=cols, keep=‘first’|‘last’|False).