BeautifulSoup makes it easy to quickly scrape content from web pages. Here are two examples.
Electricity prices from Tocom: https://www.tocom.or.jp/market/kobetu/east_base_elec.html
The page has three blocks (Current, Night, and Day sessions). Each block is under a h3, with the first table providing the session name and date, and the second table provides the prices. The first table is a bare table consisting of table rows, while the second table has a thead element.

To parse the entire page, we loop through each h3 and use find_next_sibling to get the two tables.
soup = bs4.BeautifulSoup(requests.get(url).content,
features='html.parser')
h3 = soup.find('h3')
while h3 is not None:
name = h3.contents[0].strip()
table0 = h3.find_next_sibling('table')
table1 = table0.find_next_sibling('table')
tables[name] = [parse_rows(table0),
parse_html_table_with_header(table1)]
h3 = h3.find_next_sibling('h3')
To parse a table we can just look for any tr elements and then pull out all the td elements. We check if the thing we have supports find_all by calling hasattr. This is a quick and dirty way to skip over textual elements between the table rows.
def parse_rows(x):
rows = []
if hasattr(x, 'find_all'):
for row in x.find_all('tr'):
cols = row.find_all('td')
cols =
this_row = ]
if cols:
rows.append(this_row)
return rows
To parse a table with a header, we do the usual for the rows and also search for thead and all th elements.
def parse_html_table_with_header(t):
rows = []
for bits in t:
x = parse_rows(bits)
if x != []: rows += x
header = [h.text.strip() for h in \
t.find('thead').find_all('th')]
return (header, rows)
Once we have all the tables it is straightforward to convert it into a Pandas DataFrame. See the full source for how to do this: https://github.com/carlohamalainen/playground/blob/master/python/beautiful_soup_4/tocom_kobetu_prices.py
Sample output:
$ python tocom_kobetu_prices.py
https://www.tocom.or.jp/market/kobetu/east_base_elec.html
Current Trading (16:30 - 15:15)
Trade Date: Oct 16, 2019
Prices in yen / kWh
Month Last Settlement Price Open High Low Close Change Volume Settlement
0 Oct 2019 9.73 - - - - - - -
1 Nov 2019 9.16 - - - - - - -
2 Dec 2019 10.15 - - - - - - -
3 Jan 2020 10.71 - - - - - - -
4 Feb 2020 10.72 - - - - - - -
5 Mar 2020 9.28 - - - - - - -
6 Apr 2020 9.05 - - - - - - -
7 May 2020 9.02 - - - - - - -
8 Jun 2020 9.04 - - - - - - -
9 Jul 2020 10.24 - - - - - - -
10 Aug 2020 10.07 - - - - - - -
11 Sep 2020 9.11 - - - - - - -
12 Oct 2020 8.97 - - - - - - -
13 Nov 2020 8.81 - - - - - - -
14 Dec 2020 9.32 - - - - - - -
The next example is scraping stock prices from Yahoo. I used to use Alphavantage for daily closing prices but their free API doesn’t seem to work at the moment. (Their API says that I have been rate limited, but I was only querying it once a day for a handful of equities).
Luckily for us, the historical pages on Yahoo have a json blob in the middle with all the info that we need, so we can avoid parsing HTML tables. We just grab the content after root.App.mainmg and parse as json:
base = 'https://sg.finance.yahoo.com/quote'
ticker = 'BHP.AX'
url = f'{base}/{ticker}/history/'
x = requests.get(url).content
soup = bs4.BeautifulSoup(x, features='html.parser')
# https://stackoverflow.com/questions/39631386/how-to-understand-this-raw-html-of-yahoo-finance-when-retrieving-data-using-pyt
script = soup.find('script', text=re.compile('root.App.main')).text
j = json.loads(re.search('root.App.main\s+=\s+(\{.*\})', script).group(1),
parse_float=lambda x: x)
That’s it! The rest is just manipulating the json to get the fields of interest. Full source: [https://github.com/carlohamalainen/playground/blob/master/python/beautiful_soup_4/yahoo_stock_prices.py](https://github.com/carlohamalainen/playground/blob/master/python/beautiful_soup_4/yahoo_stock_prices.py
Sample run:
$ python yahoo_stock_prices.py
AUD to SGD: ('AUDSGD=X', 'Europe/London', '4:21PM BST', 0.9273)
BHP.AX AUD 2018-10-15 33.90
BHP.AX AUD 2018-10-16 33.66
BHP.AX AUD 2018-10-17 33.20
BHP.AX AUD 2018-10-18 33.10
BHP.AX AUD 2018-10-21 33.16
BHP.AX AUD 2018-10-22 32.79
BHP.AX AUD 2018-10-23 32.07
BHP.AX AUD 2018-10-24 30.80
BHP.AX AUD 2018-10-25 31.20