Automatize Websites using Selenium & Python

avatar
(Edited)

In my free times I learn and code with python. Selenium is also an excellent tool to automate almost anything on the web.

0_1UAhxejyNcGcLxxR.png

What is Selenium?

Selenium refers to a number of different open-source projects used for browser automation. It supports bindings for all major programming languages, including our favorite language: Python.

The Selenium API uses the WebDriver protocol to control a web browser, like Chrome, Firefox or Safari. The browser can run either localy or remotely.

At the beginning of the project (almost 20 years ago!) it was mostly used for cross-browser, end-to-end testing (acceptance tests).

Now it is still used for testing, but it is also used as a general browser automation platform. And of course, it us used for web scraping!

Selenium is useful when you have to perform an action on a website such as:

  • Clicking on buttons
  • Filling forms
  • Scrolling
  • Taking a screenshot

It is also useful for executing Javascript code. Let's say that you want to scrape a Single Page Application. Plus you haven't found an easy way to directly call the underlying APIs. In this case, Selenium might be what you need.


Installation

We will use Chrome in our example, so make sure you have it installed on your local machine:

  • Chrome download page
  • Chrome driver binary
  • selenium package

To install the Selenium package, as always, I recommend that you create a virtual environment (for example using virtualenv) and then:

pip install selenium

Open a WebPage

Once you have downloaded both Chrome and Chromedriver and installed the Selenium package, you should be ready to start the browser:

from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://google.com')

This will launch Chrome in headfull mode (like regular Chrome, which is controlled by your Python code). You should see a message stating that the browser is controlled by automated software.


Show more data

from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
driver.get("https://www.nintendo.com/")
driver.maximize_window()
print(driver.page_source)
print(driver.title)
print(driver.current_url)
driver.quit()

Here are three other interesting WebDriver properties:

  • driver.page_source will return the full page HTML code.
  • driver.title gets the page's title
  • driver.current_url gets the current URL (this can be useful when there are redirections on the website and you need the final URL)

Locating Elements

Locating data on a website is one of the main use cases for Selenium, either for a test suite or to extract data and save it for further analysis (web scraping).

There are many methods available in the Selenium API to select elements on the page. You can use:

  • Tag name
  • Class name
  • IDs
  • XPath
  • CSS selectors

As usual, the easiest way to locate an element is to open your Chrome dev tools and inspect the element that you need. A cool shortcut for this is to highlight the element you want with your mouse and then press Ctrl + Shift + C or on macOS Cmd + Shift + C instead of having to right click + inspect each time:


WebElement

A WebElement is a Selenium object representing an HTML element.

There are many actions that you can perform on those HTML elements, here are the most useful:

  • Accessing the text of the element with the property element.text
  • Clicking on the element with element.click()
  • Accessing an attribute with element.get_attribute('class')
  • Sending text to an input with: element.send_keys('mypassword')

Taking a screenshot

We could easily take a screenshot using:

driver.save_screenshot('screenshot.png')

Executing Javascript

Sometimes, you may need to execute some Javascript on the page. For example, let's say you want to take a screenshot of some information, but you first need to scroll a bit to see it. You can easily do this with Selenium:

javaScript = "window.scrollBy(0,1000);"
driver.execute_script(javaScript)

If you perform repetitive tasks like filling forms or checking information behind a login form where the website doesn't have an API, it's maybe* a good idea to automate it with Selenium, just don't forget this post.


foto-perfil-peakd-viviendo-libre-superbaja.png

Encuéntrame aquí:

Peakd / Hive

DTube

Odysee

Youtube

Instagram



0
0
0.000
3 comments
avatar

Congratulations @viviendolibre! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

You received more than 1000 upvotes.
Your next target is to reach 1250 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Hive Tour Update - Governance
Support the HiveBuzz project. Vote for our proposal!
0
0
0.000