Automating Historical Price Data Downloads From Yahoo Finance Using Selenium In Python

avatar

yf1.png

yf2.png

A few days ago I wrote about getting historical price data using python modules like yfinance and finnhub. While using modules and APIs like that is super efficient, sometimes it also useful to download such data directly from websites. I am not just talking about historical price data. Automating any kind of file downloads is quite useful, especially if one has to deal with large amount of files and repeats the process frequently.

Today I played a little bit with a Selenium to automate downloading historical price data from Yahoo Finance website. Above you can see the screenshots of the full code. It is not perfect, it does need improvements. But it works. Let me share my experience with Selenium, how to do initial setup, and talk a little bit about what the code does.

In the past I have used web automation at its simplest form using basic Chrome webdriver. It was the time when I was obsessed with curation and would spend countless hours reading posts. I have heard about Selenium, but never actually got to use it until now. It is awesome!

To use Selenium in our python code we need to make some initial configurations/installations:

  1. pip install selenium just like installing any other module. Some will have to use pip3 or pip3.8 depending on their OS and existence of the previous versions of python.
  2. Go to https://sites.google.com/a/chromium.org/chromedriver/downloads and download the proper chromedriver. You may have to go to your Chrome menu --> Help --> About Google Chrome to find out the version of Chrome on your machine.
  3. Make sure chromedriver is stored in an easily accessible location. We will need to provide path to chromedriver file in the python code. For Windows users the file will have an .exe extension.

Ok, that should be it for the initial setup. Let's talk about the code itself now.

Lines 7-10 is a helper function - get_file_path. This is just to make it easy to create a path to our chromedriver. Later at line 24 we will assign the return value to CHROMEDRIVER_PATH, which will be used to tell the driver where our chromedriver file is located.

Lines 12-22 is another helper function - get_download_folder. When we are ready to download files from the website, we want them to be stored at a specific location. In this case we already have a yf_downloads folder, we will pass this as an argument to this function later. We also want this function to create a new folder inside yf_downloads with a name of today's date. That is where we will store all downloaded files. We use this function at line 24 and assign the return value to DOWNLOAD_PATH

Lines 27-31: here we are configuring our webdriver using ChromeOptions()to accomplish couple important things. With code lines 28-29 we are changing the default download path. Normally on mac web downloads go to the Downloads folder. With lines 30-31 we are changing how our browser behaves to headless. But default and without headless, Selenium will open the Chrome browser on the screen, will execute actions, and then close. By using headless we will not see these actions on the screen. Everything will be done behind the scene.

Lines 34-39: here we are assigning an important variable interval. It will be used to define the timeframe for the price data. We have three options: '1d' - for daily prices, '1wk' - for weekly prices, and '1mo' - for monthly prices. There was a little issue the way data returned for weekly and monthly, and I had to come up with a work around to fix the end date. That's why we have a date_fixer variable.

Lines 41-47: deal with date variables to figure out the starting date and ending date. Our end date will be today's date for daily stock prices, and the previous day for weekly and monthly prices. Start date will be a year or 365 days ago. If we want max historic price data we can assign start='0'.

Line 49 is the list of our stocks. For testing purposes I was just using random stocks.

Rest of the code is where all the fun is happening with Selenium. Originally, when I started coding this I had a different approach. What I wanted to do was to give Selenium driver a list of instructions based on how manual downloading would look like. For Yahoo Finance, we would need to go to htts://finance.yahoo.com, then find 'Quote Lookup' field and search for the stock name. Then click 'Historical Data' tab. Afterwards enter 'Time Period', 'Frequency' fields, and hit 'Download' link.

As I was experimenting with that I realized, as user changes time periods and frequency, the website creates a query link behind the scenes and can be seen when inspecting the source code of the page. This actually made the task easier. Instead of clicking and adjusting bunch of things (Selenium is great at that), I could just change the text for the generated query link. Once this link is put in the browser, it automatically downloads the file. Here how a sample query link looks like:

https://query1.finance.yahoo.com/v7/finance/download/SQ?period1=1566176492&period2=1597712492&interval=1d&events=history

So I decided to go by just changing the parameters like stock name, period1, period2, and interval values. Also, at the end of the for loop you can see time.sleep(1) to give enough time for downloads before driver.quit() that closes the browser/driver.

Selenium is fun.

Posted Using LeoFinance



0
0
0.000
5 comments
avatar

Selenium is incredibly useful, and for Python there's the added support of being able to use the RIDE GUI with it - which makes the development of test suites even faster. Though, for some things, it's just nicer to be able to write it all by hand the way you want.

Great post, and great to see Selenium being talked about here!

0
0
0.000
avatar

Thank you! I haven’t used RIDE GUI yet. Will try when I get a chance.

0
0
0.000
avatar

It's available through a pip install and combines robot-framework with Selenium, and then wraps both in a simple plain-English keyword setup. It's really slick. I do QA as my day job so I've got a suite of verging on 200 tests right now that I run through it every day. Highly recommend if you're doing repetitive testing! Hope you enjoy it when you're able to give it a try!

0
0
0.000
avatar

I have picked your post for my daily hive voting initiative, Keep it up and Hive On!!

0
0
0.000