Automatize Websites using Selenium & Python
In my free times I learn and code with python. Selenium is also an excellent tool to automate almost anything on the web.
What is Selenium?
Selenium refers to a number of different open-source projects used for browser automation. It supports bindings for all major programming languages, including our favorite language: Python.
The Selenium API uses the WebDriver protocol to control a web browser, like Chrome, Firefox or Safari. The browser can run either localy or remotely.
At the beginning of the project (almost 20 years ago!) it was mostly used for cross-browser, end-to-end testing (acceptance tests).
Now it is still used for testing, but it is also used as a general browser automation platform. And of course, it us used for web scraping!
Selenium is useful when you have to perform an action on a website such as:
- Clicking on buttons
- Filling forms
- Scrolling
- Taking a screenshot
It is also useful for executing Javascript code. Let's say that you want to scrape a Single Page Application. Plus you haven't found an easy way to directly call the underlying APIs. In this case, Selenium might be what you need.
Installation
We will use Chrome in our example, so make sure you have it installed on your local machine:
- Chrome download page
- Chrome driver binary
- selenium package
To install the Selenium package, as always, I recommend that you create a virtual environment (for example using virtualenv) and then:
pip install selenium
Open a WebPage
Once you have downloaded both Chrome and Chromedriver and installed the Selenium package, you should be ready to start the browser:
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://google.com')
This will launch Chrome in headfull mode (like regular Chrome, which is controlled by your Python code). You should see a message stating that the browser is controlled by automated software.
Show more data
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("https://www.nintendo.com/")
driver.maximize_window()
print(driver.page_source)
print(driver.title)
print(driver.current_url)
driver.quit()
Here are three other interesting WebDriver properties:
driver.page_source
will return the full page HTML code.driver.title
gets the page's titledriver.current_url
gets the current URL (this can be useful when there are redirections on the website and you need the final URL)
Locating Elements
Locating data on a website is one of the main use cases for Selenium, either for a test suite or to extract data and save it for further analysis (web scraping).
There are many methods available in the Selenium API to select elements on the page. You can use:
- Tag name
- Class name
- IDs
- XPath
- CSS selectors
As usual, the easiest way to locate an element is to open your Chrome dev tools and inspect the element that you need. A cool shortcut for this is to highlight the element you want with your mouse and then press Ctrl + Shift + C or on macOS Cmd + Shift + C instead of having to right click + inspect each time:
WebElement
A WebElement is a Selenium object representing an HTML element.
There are many actions that you can perform on those HTML elements, here are the most useful:
- Accessing the text of the element with the property
element.text
- Clicking on the element with
element.click()
- Accessing an attribute with
element.get_attribute('class')
- Sending text to an input with:
element.send_keys('mypassword')
Taking a screenshot
We could easily take a screenshot using:
driver.save_screenshot('screenshot.png')
Executing Javascript
Sometimes, you may need to execute some Javascript on the page. For example, let's say you want to take a screenshot of some information, but you first need to scroll a bit to see it. You can easily do this with Selenium:
javaScript = "window.scrollBy(0,1000);"
driver.execute_script(javaScript)
If you perform repetitive tasks like filling forms or checking information behind a login form where the website doesn't have an API, it's maybe* a good idea to automate it with Selenium, just don't forget this post.
It is a good tutorial, keep writing
Congratulations @viviendolibre! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :
Your next target is to reach 1250 upvotes.
You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP
Check out the last post from @hivebuzz:
Support the HiveBuzz project. Vote for our proposal!
Хороший мануал