← retour aux snippets

pandas: lecture CSV en chunks

Traiter un gros CSV par blocs (chunksize) en streaming.

python pandas #pandas#csv#chunksize

objectif

Traiter un gros CSV par blocs (chunksize) en streaming.

code minimal

import pandas as pd, io

csv = "id,val\n" + "\n".join(f"{i},{i*i}" for i in range(10))
it = pd.read_csv(io.StringIO(csv), chunksize=4)
total = 0
for chunk in it:
    total += chunk["val"].sum()
print(total)

utilisation

import pandas as pd, io

csv = "a\n" + "\n".join(str(i) for i in range(7))
print(sum(len(c) for c in pd.read_csv(io.StringIO(csv), chunksize=3)))

variante(s) utile(s)

import pandas as pd
# it = pd.read_csv("big.csv", chunksize=100_000)
print("ok")

notes

  • Cumulez des agrégats plutôt que de stocker tous les chunks.