objectif
Lire un CSV avec dtypes explicites pour performance.
code minimal
import pandas as pd, io
csv = "id,val\n1,10\n2,20\n"
df = pd.read_csv(io.StringIO(csv), dtype={"id":"int32","val":"int32"})
print(str(df.dtypes["id"]))
utilisation
import pandas as pd, io
csv = "x\n001\n002\n"
print(str(pd.read_csv(io.StringIO(csv), dtype={"x":"string"}).dtypes["x"]))
variante(s) utile(s)
import pandas as pd, io
csv = "x\ntrue\nfalse\n"
print(pd.read_csv(io.StringIO(csv), dtype={"x":"boolean"})["x"].sum() == 1)
notes
- Dtypes compacts réduisent RAM (int32, float32, category).