← retour aux snippets

pandas: semi-join et anti-join

Simuler semi/anti join via merge(indicator=True).

python pandas #pandas#join#semi

objectif

Simuler semi/anti join via merge(indicator=True).

code minimal

import pandas as pd

a = pd.DataFrame({"id":[1,2,3]})
b = pd.DataFrame({"id":[2,4]})
m = a.merge(b, on="id", how="left", indicator=True)
semi = a[m["_merge"] == "both"]
anti = a[m["_merge"] == "left_only"]
print(semi["id"].tolist(), anti["id"].tolist())

utilisation

import pandas as pd

a = pd.DataFrame({"k":[1,2,3]})
b = pd.DataFrame({"k":[3]})
print(a.merge(b, on="k", how="left", indicator=True).query("_merge=='left_only'")["k"].tolist())

variante(s) utile(s)

import pandas as pd

a = pd.DataFrame({"k":[1,2]}); b = pd.DataFrame({"k":[2,3]})
print(a.merge(b, on="k", how="inner")["k"].tolist())

notes

  • indicator=True expose la provenance des lignes jointes.