Using Python & BeautifulSoup To Obtain DJ Magazine Top 100 DJs 2021

avatar

Hi there. In this programming post, I use Python & the Beautiful Soup package to webscrape the top 100 DJs 2021 from DJ Magazine website.


Pixabay Image Source

 

Topics


  • About DJ Magazine Top 100 DJs
  • Python & Beautiful Soup Setup
  • Webscrape & Extract The Top 100 DJs

 

About DJ Magazine Top 100 DJs


DJ Magazine (DJ Mag) was founded in 1991 and dedicates itself to electronic dance music (EDM). Voting for the number 1 DJ was from DJ Magazine itself for the years of 1991 to 1996. Public voting started from 1997 and still continues to this day.

Trance Producer/DJ Armin van Buuren has been voted the most as #1 DJ 5 times. David Guetta, Martin Garrix & Tiësto were voted #1 3 times each.


Pixabay Image Source

 

Python & Beautiful Soup Setup


The setup for BeautifulSoup in Python is not too difficult. I use jupyterNotebook for the code. The url link goes to the most recent Top 100 DJs page. At the time of writing, the url goes to the top 100 DJs for 2021.

# Imports

from bs4 import BeautifulSoup
import requests
import pandas as pd
# Imports

# URL for searching Android tablets on Newegg.ca
url = "https://djmag.com/top100djs/"

# Get Request
response = requests.get(url)

# Get the soup
soup = BeautifulSoup(response.content, 'html.parser')

 

Webscrape & Extract The Top 100 DJs


For this webscraping project, I wanted to extract the name of the DJ, rank change from the previous year and the interview URL. The DJ rank can be added in the dataframe in the form of range(1, 101).

When it comes to data extraction I use list comprehensions. List comprehensions in Python is nice as it is less lines of code compared to initializing empty lists and add to lists with appends in for loops.

Extract DJ Names

From the soup object, the DJ names are in the div tags with the class of top100dj-name of the HTML page. In each instance x I find the a tag and get the text to extract the DJ name. All 100 DJ names are extracted in this list comprehension.

# DJ Names are in div tags with class top100dj-name then get a tag and text.

dj_names = [x.find('a').get_text().strip() for x in soup.find_all('div', {'class': 'top100dj-name'})]

 

dj_name_screenshot.PNG

 

Rank Movements

Rank movements for each DJ are in the div tags associated with the class name of 'top100dj-movement'. The rank movements as text in each instance x in soup.find_all('div', {'class': 'top100dj-movement'}) are extracted.

# Rank Movements:

rank_movements = [x.get_text().strip() for x in soup.find_all('div', {'class': 'top100dj-movement'})]

 

dj_rank_movement_screenshot.PNG

 

Interview URLS

For the interview URLs, the hrefs do not contain the text of "https://djmag.com/top100djs/" at the start of the URL. I have to include that in before each extracted interview URL.

In each instance x, I obtain the href link from the a tag for the DJ.

# Interview URLs, adding https://djmag.com/top100djs/

interview_urls = ["https://djmag.com/top100djs/" + x.find('a')['href'] 
                  for x in soup.find_all('div', {'class': 'top100dj-name'})]

 

Create Dataframe

With pd.DataFrame() the pandas dataframe can be made with the extracted data.

# Create dataframe:
DJMag_top100_2021_df = pd.DataFrame({
                           'Rank': range(1, 101),
                           'DJ Name': dj_names,
                           'Rank Change': rank_movements,
                           'Interview URL': interview_urls
})

 

Running DJMag_top100_2021_df.head(10) gives the top 10 DJs from 2021 in the dataframe.

top10_DJs_2021.PNG

Here are the top 10 DJs from the voters for DJ Magazine in 2021 in table format.

RankDJRank ChangeInterview URL
1David GuettaNon MoverInterview
2Matrin GarrixUp 1Interview
3Armin Van BuurenUp 1Interview
4AlokUp 1Interview
5Dimitri Vegas & Like MikeDown 3Interview
6AfrojackUp 1Interview
7Don DiabloDown 1Interview
8Oliver HeldensStay 8Interview
9Timmy TrumpetUp 1Interview
10Steve AokiDown 1Interview

Thank you for reading.

Posted with STEMGeeks



0
0
0.000
1 comments