Automatize Websites using Selenium & Python

about 3 years ago (Edited)

In my free times I learn and code with python. Selenium is also an excellent tool to automate almost anything on the web.

What is Selenium?

Selenium refers to a number of different open-source projects used for browser automation. It supports bindings for all major programming languages, including our favorite language: Python.

The Selenium API uses the WebDriver protocol to control a web browser, like Chrome, Firefox or Safari. The browser can run either localy or remotely.

At the beginning of the project (almost 20 years ago!) it was mostly used for cross-browser, end-to-end testing (acceptance tests).

Now it is still used for testing, but it is also used as a general browser automation platform. And of course, it us used for web scraping!

Selenium is useful when you have to perform an action on a website such as:

Clicking on buttons
Filling forms
Scrolling
Taking a screenshot

It is also useful for executing Javascript code. Let's say that you want to scrape a Single Page Application. Plus you haven't found an easy way to directly call the underlying APIs. In this case, Selenium might be what you need.

Installation

We will use Chrome in our example, so make sure you have it installed on your local machine:

Chrome download page
Chrome driver binary
selenium package

To install the Selenium package, as always, I recommend that you create a virtual environment (for example using virtualenv) and then:

pip install selenium

Open a WebPage

Once you have downloaded both Chrome and Chromedriver and installed the Selenium package, you should be ready to start the browser:

from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://google.com')

This will launch Chrome in headfull mode (like regular Chrome, which is controlled by your Python code). You should see a message stating that the browser is controlled by automated software.

Show more data

from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
driver.get("https://www.nintendo.com/")
driver.maximize_window()
print(driver.page_source)
print(driver.title)
print(driver.current_url)
driver.quit()

Here are three other interesting WebDriver properties:

driver.page_source will return the full page HTML code.
driver.title gets the page's title
driver.current_url gets the current URL (this can be useful when there are redirections on the website and you need the final URL)

Locating Elements

Locating data on a website is one of the main use cases for Selenium, either for a test suite or to extract data and save it for further analysis (web scraping).

There are many methods available in the Selenium API to select elements on the page. You can use:

Tag name
Class name
IDs
XPath
CSS selectors

As usual, the easiest way to locate an element is to open your Chrome dev tools and inspect the element that you need. A cool shortcut for this is to highlight the element you want with your mouse and then press Ctrl + Shift + C or on macOS Cmd + Shift + C instead of having to right click + inspect each time:

WebElement

A WebElement is a Selenium object representing an HTML element.

There are many actions that you can perform on those HTML elements, here are the most useful:

Accessing the text of the element with the property element.text
Clicking on the element with element.click()
Accessing an attribute with element.get_attribute('class')
Sending text to an input with: element.send_keys('mypassword')

Taking a screenshot

We could easily take a screenshot using:

driver.save_screenshot('screenshot.png')

Executing Javascript

Sometimes, you may need to execute some Javascript on the page. For example, let's say you want to take a screenshot of some information, but you first need to scroll a bit to see it. You can easily do this with Selenium:

javaScript = "window.scrollBy(0,1000);"
driver.execute_script(javaScript)

If you perform repetitive tasks like filling forms or checking information behind a login form where the website doesn't have an API, it's maybe* a good idea to automate it with Selenium, just don't forget this post.

Encuéntrame aquí:

Peakd / Hive

DTube

Odysee

Youtube

Instagram

stemgeeks programming python technology selenium automatize coding fun learn proofofbrain

0.000

3 comments

@coffeelovers 64

about 3 years ago

It is a good tutorial, keep writing

0.000

@hivebuzz 74

about 3 years ago

Congratulations @viviendolibre! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

	You received more than 1000 upvotes. Your next target is to reach 1250 upvotes.

_{You can view your badges on your board and compare yourself to others in the Ranking}
_{If you no longer want to receive notifications, reply to this comment with the word STOP}

Check out the last post from @hivebuzz:

	Hive Tour Update - Governance

Support the HiveBuzz project. Vote for our proposal!

0.000

@prolinuxua 68

about 3 years ago

Хороший мануал

0.000