objectif
Extraire des tables HTML sans scraper complexe.
code minimal
import pandas as pd
html = "<table><tr><th>a</th></tr><tr><td>1</td></tr></table>"
dfs = pd.read_html(html)
print(isinstance(dfs[0], pd.DataFrame))
utilisation
import pandas as pd
print(len(pd.read_html("<table><tr><td>x</td></tr></table>")))
variante(s) utile(s)
import pandas as pd
html = "<table><tr><td>1</td><td>2</td></tr></table>"
print(pd.read_html(html)[0].shape[1])
notes
- Nécessite parsers HTML; utile pour quick&dirty ETL.