objectif
Traiter un gros CSV par blocs (chunksize) en streaming.
code minimal
import pandas as pd, io
csv = "id,val\n" + "\n".join(f"{i},{i*i}" for i in range(10))
it = pd.read_csv(io.StringIO(csv), chunksize=4)
total = 0
for chunk in it:
total += chunk["val"].sum()
print(total)
utilisation
import pandas as pd, io
csv = "a\n" + "\n".join(str(i) for i in range(7))
print(sum(len(c) for c in pd.read_csv(io.StringIO(csv), chunksize=3)))
variante(s) utile(s)
import pandas as pd
# it = pd.read_csv("big.csv", chunksize=100_000)
print("ok")
notes
- Cumulez des agrégats plutôt que de stocker tous les chunks.