BeautifulSoup makes it easy to quickly scrape content from web pages. Here are two examples.

Electricity prices from Tocom: https://www.tocom.or.jp/market/kobetu/east_base_elec.html

The page has three blocks (Current, Night, and Day sessions). Each block is under a h3, with the first table providing the session name and date, and the second table provides the prices. The first table is a bare table consisting of table rows, while the second table has a thead element.

To parse the entire page, we loop through each h3 and use find_next_sibling to get the two tables.

To parse a table we can just look for any tr elements and then pull out all the td elements. We check if the thing we have supports find_all by calling hasattr. This is a quick and dirty way to skip over textual elements between the table rows.

To parse a table with a header, we do the usual for the rows and also search for thead and all th elements.

Once we have all the tables it is straightforward to convert it into a Pandas DataFrame. See the full source for how to do this: https://github.com/carlohamalainen/playground/blob/master/python/beautiful_soup_4/tocom_kobetu_prices.py

Sample output:

$python tocom_kobetu_prices.py https://www.tocom.or.jp/market/kobetu/east_base_elec.html Current Trading (16:30 - 15:15) Trade Date: Oct 16, 2019 Prices in yen / kWh Month Last Settlement Price Open High Low Close Change Volume Settlement 0 Oct 2019 9.73 - - - - - - - 1 Nov 2019 9.16 - - - - - - - 2 Dec 2019 10.15 - - - - - - - 3 Jan 2020 10.71 - - - - - - - 4 Feb 2020 10.72 - - - - - - - 5 Mar 2020 9.28 - - - - - - - 6 Apr 2020 9.05 - - - - - - - 7 May 2020 9.02 - - - - - - - 8 Jun 2020 9.04 - - - - - - - 9 Jul 2020 10.24 - - - - - - - 10 Aug 2020 10.07 - - - - - - - 11 Sep 2020 9.11 - - - - - - - 12 Oct 2020 8.97 - - - - - - - 13 Nov 2020 8.81 - - - - - - - 14 Dec 2020 9.32 - - - - - - -  The next example is scraping stock prices from Yahoo. I used to use Alphavantage for daily closing prices but their free API doesn’t seem to work at the moment. (Their API says that I have been rate limited, but I was only querying it once a day for a handful of equities). Luckily for us, the historical pages on Yahoo have a json blob in the middle with all the info that we need, so we can avoid parsing HTML tables. We just grab the content after root.App.main and parse as json: That’s it! The rest is just manipulating the json to get the fields of interest. Full source: https://github.com/carlohamalainen/playground/blob/master/python/beautiful_soup_4/yahoo_stock_prices.py Sample run: $ python yahoo_stock_prices.py

AUD to SGD: ('AUDSGD=X', 'Europe/London', '4:21PM BST', 0.9273)

BHP.AX AUD 2018-10-15 33.90
BHP.AX AUD 2018-10-16 33.66
BHP.AX AUD 2018-10-17 33.20
BHP.AX AUD 2018-10-18 33.10
BHP.AX AUD 2018-10-21 33.16
BHP.AX AUD 2018-10-22 32.79
BHP.AX AUD 2018-10-23 32.07
BHP.AX AUD 2018-10-24 30.80
BHP.AX AUD 2018-10-25 31.20