← retour aux snippets

pandas: mémoire downcast int/float

Réduire la RAM en downcast des colonnes numériques.

python pandas #pandas#memory#downcast

objectif

Réduire la RAM en downcast des colonnes numériques.

code minimal

import pandas as pd
import numpy as np

df = pd.DataFrame({"i":np.arange(1000), "f":np.random.default_rng(0).normal(size=1000)})
df2 = df.assign(i=pd.to_numeric(df["i"], downcast="unsigned"), f=pd.to_numeric(df["f"], downcast="float"))
print(str(df2.dtypes["i"]).startswith("uint"))

utilisation

import pandas as pd
import numpy as np

s = pd.Series(np.arange(10**4))
s2 = pd.to_numeric(s, downcast="unsigned")
print(s2.dtype.itemsize <= s.dtype.itemsize)

variante(s) utile(s)

import pandas as pd

s = pd.Series([1.0,2.0,3.0])
print(pd.to_numeric(s, downcast="float").dtype)

notes

  • Downcast prudent: vérifier ranges et NaN.