Crawling - Practice Melon-chart๐Ÿˆ

ํ™”์ดํ‹ฐ ยท2023๋…„ 12์›” 18์ผ

Crawling

๋ชฉ๋ก ๋ณด๊ธฐ
1/7

Step1: Import library

import requests as req
from bs4 import BeautifulSoup as bs
import pandas as pd
from datetime import datetime

Step2: Declaration

head_option = {
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
}
url = "https://www.melon.com/chart/index.htm"
res= req.get(url, headers = head_option)
html = bs(res.text,'lxml')

Step3: Find title, artist list

songs = html.select('div.ellipsis.rank01')
singers = html.select('div.ellipsis.rank02 > span.checkEllipsis')

Step 4: make a data list

for idx in range(0,100):
title = songs[idx].text.strip('\n')
singer = singers[idx].text
print("{:03} {} / {}".format(idx+1, title, singer))

Step5: create dataFrame

list_song = [song.text.strip('\n') for song in songs]
list_singer = [singer.text for singer in singers]
data = pd.DataFrame(data=zip(range(1,101),list_song, list_singer), columns=['Rank', 'Title', 'Singer'])
data

Step6: export to excel

now = datetime.now()
filename = now.strftime('Melon_Top100_at_%Y%m%d_%Hh%Mm.xlsx')
data.to_excel(filename, **index=False**)
profile
์—ด์‹ฌํžˆ ๊ณต๋ถ€ํ•ฉ์‹œ๋‹ค! The best is yet to come! ๐Ÿ’œ

0๊ฐœ์˜ ๋Œ“๊ธ€