TIL Python Basics Day 46 - Scraping the Billboard Hot 100

이다연·2021년 1월 29일
0

Udemy Python Course

목록 보기
41/62

Learning

How to find the hierarchy/path in the code

  • locating the element: tag - right click - css selector
    Inspect and copy from CSS selector/path
    Copy directly from the code


e.g.

#li.chart-list__element:nth-child(1) > 
button:nth-child(1) > span:nth-child(2) > span:nth-child(1)

# li.chart-list__element.display--flex
# button.chart-element__wrapper.display--flex.flex--grow.sort--this-week
# span.chart-element__information
# span.chart-element__information__song.
#text--truncate.color--primary

# 1 or 2 either works fine
#1
titles = soup.find_all("span", class_="chart-element__information__song")

#2
titles = soup.select("li button span span.chart-element__information__song.text--truncate.color--primary")
# print(titles)

Find_all() or Select()?

Find_all needs shorter parameters
Select tends to have chains of path name, make the code ugly.

  • Essentially it comes down to the use case and personal preference.
    -select tend to make an ugly chains; however, little more efficient
    -select finds multiple instances and returns a list, equivalent to find_all
    -find finds the first, select_one would be the equivalent to find.
    -I almost always use css selectors when chaining tags or using tag.classname, if looking for a single element without a class I use find.
    stackoverflow

#1
titles = soup.find_all("span", class_="chart-element__information__song")

#2
titles = soup.select("li button span span.chart-
element__information__song.text--truncate.color--primary")

print(titles)

Final code

import requests
from bs4 import BeautifulSoup

URL = "https://www.billboard.com/charts/hot-100/" #2016-06-12
ask_date = input("Which year do you want to travel to? Type the date in this format YYYY-MM-DD: ")
response = requests.get(f"{URL}{ask_date}")
web_page = response.text

soup = BeautifulSoup(web_page, "html.parser")
# print(soup)

# 1 or 2 either works fine
#1
titles = soup.find_all("span", class_="chart-element__information__song")

#2
titles = soup.select("li button span span.chart-element__information__song.text--truncate.color--primary")
# print(titles)

song_titles = [song.getText() for song in titles]
print(song_titles)

Linking to Spotify playlist was part of the project; however, Spotify is not available in Korea. I tried with my existing UK account tho. Authorisation process was very difficult to understand as I have zero knowledge about authentication in web. URI and token are the things I come across for the first time.
API documentation of Spotify was poorly written too.

profile
Dayeon Lee | Django & Python Web Developer

0개의 댓글