← retour aux snippets

pandas: garder premier/dernier par groupe

Simuler SQL 'keep first/last' via sort + drop_duplicates.

python pandas #pandas#groupby#keep

objectif

Simuler SQL ‘keep first/last’ via sort + drop_duplicates.

code minimal

import pandas as pd
df = pd.DataFrame({"id":[1,1,2], "ts":[2,1,1], "val":[20,10,30]})
first = df.sort_values(["id","ts"]).drop_duplicates("id", keep="first")
print(first["val"].tolist())

utilisation

import pandas as pd
df = pd.DataFrame({"g":[1,1,1], "v":[3,1,2]})
last = df.sort_values(["g","v"]).drop_duplicates("g", keep="last")
print(last["v"].iloc[0])

variante(s) utile(s)

import pandas as pd
df = pd.DataFrame({"g":[1,1,2], "v":[1,1,2]})
print(df.drop_duplicates(["g","v"]).shape[0])

notes

  • Pour tie-breaker, ajoutez d’autres clés de tri.