← retour aux snippets

pandas: read_html (tables)

Extraire des tables HTML sans scraper complexe.

objectif

Extraire des tables HTML sans scraper complexe.

code minimal

import pandas as pd
html = "<table><tr><th>a</th></tr><tr><td>1</td></tr></table>"
dfs = pd.read_html(html)
print(isinstance(dfs[0], pd.DataFrame))

utilisation

import pandas as pd
print(len(pd.read_html("<table><tr><td>x</td></tr></table>")))

variante(s) utile(s)

import pandas as pd
html = "<table><tr><td>1</td><td>2</td></tr></table>"
print(pd.read_html(html)[0].shape[1])

notes

  • Nécessite parsers HTML; utile pour quick&dirty ETL.