Web Scraping & Extracting The Bundesliga German Football Table In R Programming

avatar

Web Scraping & Extracting The Bundesliga German Football Table In R Programming


Hi there. In this post, I have a bit of a technical guide for web scraping and extracting the Bundesliga German Football table in the R programming language.

Full R code can be found here for those interested.


Pixabay Image Source

 

Setup


I use RStudio on top of the R software for programming in R and for writing up posts offline.

For the libraries I use dplyr, tidyr and rvest. dplyr and tidyr are comparable to pandas from Python programming and rvest is comparable to BeautifulSoup from Python.

Installation of R packages requires the use of install.packages(). To install the rvest package in R, use install.packages("rvest").

For loading packages in R, use library(). Use read_html() to load in a link of a website where you want to extract data from.

# Reference: https://stackoverflow.com/questions/45450981/rvest-scrape-2-classes-in-1-tag

library(dplyr)
library(tidyr)
library(rvest)

bundesliga <- read_html("https://www.bundesliga.com/en/bundesliga/table")

 

bundesliga_feb7_2022.PNG

The item of interest here is the table from the Bundesliga table link. In Brave Browser, right click to display a pop-up menu. Click on inspect on the bottom to see the HTML code. Make sure that Elements is selected for the tab. The Bundesliga league table is in tbody.

tbody_screenshot.PNG

 

In R & rvest, the tbody table can be extracted with the use of html_elements('tbody). I usebundesliga %>% html_elements('tbody')which operates the same ashtml_elements(bundesliga, 'tbody'). The pipe operator%>%` will be used throughout here.

page <- bundesliga %>% html_elements('tbody')

 

Using html_nodes From rvest


I have found html_nodes to be super helpful here when it came to extracting the columns of the league table. Inspect is used heavily in order to extract the Bundesliga teams, ranks, league points, wins, draws, losses, goals and goal difference.

Teams

For extracting the teams in the form of FC Bayern München, Borussia Dortmund, Bayer 04 Leverkusen, etc., use the class of d-none d-lg-inline from the span tag in each table row (tr). Here is the code chunk for extracting teams.

# Teams are in the class 'd-none d-lg-inline':

teams <- page %>% 
         html_nodes("[class='d-none d-lg-inline']") %>%
         html_text2()

teams_screenshot.PNG

 

Rank

Rank can be obtained from the td tag that is the table row (tr) tag for each team. The class is rank.

# Rank & convert into integer:
team_rank <- page %>% 
             html_nodes("[class='rank']") %>%
             html_text2() %>% 
             readr::parse_integer()

 

ranks_screenshot.PNG

 

For the rest you can use the same code chunk and change the class. Full code can be found here.

  • Matches Played: Class is matches
  • Points is pts as the class.
  • Wins is from the class d-none d-lg-table-cell wins
  • d-none d-lg-table-cell wins is the class for draws
  • Losses is with d-none d-lg-table-cell looses. The spelling mistake is in their HTML.
  • Goals is with the class d-none d-md-table-cell goals.
  • difference is the class for Goal Difference.

 

Creating A Dataframe


Once the parts are extracted, you can create a dataframe in R.

### Create Bundesliga dataframe:

bundes_df <- data.frame(Rank = team_rank, Team = teams, Points = points,
                        Played = matches, Wins = wins, Draws = draws,
                        Losses = losses, Goals = goals, 
                        Goal_Difference = goal_diff)

 

The Goals column combines Goals For and Goal Against in the format of GF:GA. This can be separated with the use of separate() from R's tidyr package.

I also save the current Bundesliga table into a .csv file with the current date.

 

## Goals Column Separate Into Goals For and Goals Against:

bundes_df <- bundes_df %>% separate(Goals, c("Goals For", "Goals Against"))

 


Pixabay Image Source

 

Saving Into A .csv File


Saving a dataframe in R into .csv file is quite easy. You just need write.csv() and include a dataframe and file name.
I save the current Bundesliga table into a .csv file along with the current date. R's paste() function allows for concatenating/combining strings together.

## Save Bundesliga Table Into A .csv file.

write.csv(bundes_df, paste("Bundesliga_", Sys.Date(), sep = ""))

 


Pixabay Image Source

 

Thank you for reading.

Posted with STEMGeeks



0
0
0.000
1 comments