← retour aux snippets

Featuretools: feature engineering automatique

générer des features relationnelles automatiquement

Featuretools: feature engineering automatique

objectif

Expliquer et montrer comment générer des features relationnelles automatiquement.

code minimal

import pandas as pd
import featuretools as ft

customers = pd.DataFrame({"customer_id":[1,2,3], "zip":[75001,69001,13001]})
orders = pd.DataFrame({"order_id":[11,12,13,14], "customer_id":[1,2,2,3], "amount":[20,35,15,40]})
es = ft.EntitySet(id="retail")
es = es.add_dataframe(dataframe_name="customers", dataframe=customers, index="customer_id")
es = es.add_dataframe(dataframe_name="orders", dataframe=orders, index="order_id")
es = es.add_relationship("customers", "customer_id", "orders", "customer_id")
fm, feats = ft.dfs(entityset=es, target_dataframe_name="customers", agg_primitives=["sum","mean","count"])
fm.head()

utilisation

# limiter les primitives pour accélérer
fm2, _ = ft.dfs(entityset=es, target_dataframe_name="customers", agg_primitives=["sum","count"])

variante(s) utile(s)

# calcul en parallèle via Dask (voir doc Featuretools)

notes

  • DFS (Deep Feature Synthesis) compose des primitives d’agrégation/transformation.
  • Contrôlez la profondeur et la liste des primitives pour maîtriser le volume.