Date : January 8, 2024
π μ£Όμ μ’
κ° μ‘°ν μμ€ν
- μ£Όμ μΈμ΄ : Python
- μ£Όμ λΌμ΄λΈλ¬λ¦¬ : BeautifulSoup, Pandas
- μ‘°ν μ¬μ΄νΈ : λ€μ΄λ² μ¦κΆ, investing
- μ£Όμ μ’
λͺ© : μΌμ±μ μ(005930), SCHD ETF
- μ‘°ν λ΄μ© : 200μΌ λμμ μΌμ±μ μ μΌλ³ μ’
κ°, SCHD ETFμ μ΅κ·Ό μ’
κ°
π μ‘°ν λ°©λ² : BeautifulSoup
- μΌμ±μ μ μ’
λͺ© νμ΄μ§μμ μ’
κ°κ° κΈ°μ
λ νμ΄μ§λ₯Ό for λ°λ³΅λ¬ΈμΌλ‘ μ‘°ν
- νμ΄μ§ λΉ κΈ°μ
λ 10κ° νμ λν΄ ν΄λΉ κ±°λμΌ λ° μ’
κ°λ₯Ό for λ°λ³΅λ¬ΈμΌλ‘ μ‘°ν
- μ‘°νλ λ΄μ©μ text νμμΌλ‘ return
π μ½λ μμ½ : BeautifulSoup
- bs4 : BeautifulSoup library ver. 4
- requests : url, header μ 보λ₯Ό μ΄μ©νμ¬ μΉ νμ΄μ§μ HTML λ¬Έμ μμ²
- response : μμ²ν HTML λ¬Έμ νμ
- URL Information : finance.naver.com
- Header Information : useragentstring.com
- parser : λ¬Έμμ λ΄μ©μ ν ν°μΌλ‘ ꡬλΆνκ³ νμ€νΈλ¦¬ μμ±
- isCheckNone : None κ° νν°λ§νμ¬ μ μΈ
π μ‘°ν λ°©λ² : Pandas
- μΌμ±μ μ μ’
λͺ© νμ΄μ§μμ μ’
κ°κ° κΈ°μ
λ νμ΄μ§λ₯Ό DataFrame λͺ¨λλ‘ μ‘°ν
- μ‘°νν νμ΄μ§μμ read_html λͺ¨λλ‘ κ±°λμΌ, μ’
κ° λ±μ λ°μ΄ν° μΆμΆ
- μΆμΆλ λ΄μ©μ text νμμΌλ‘ return
π μ½λ μμ½ : Pandas
- DataFrame : Pandas Module
- append : Pandas Module
- read_html : Pandas Module
- ignore_index = True : Pandas Module
- dropna : Pandas Module
from bs4 import BeautifulSoup
import requests
for page in range(1, 6):
print(str(page))
url_005930 = "http://finance.naver.com/item/sise_day.nhn?code=005930" + "&page=" + str(page)
headers = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}
response = requests.get(url_005930, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
parsing_list = soup.find_all("tr")
isCheckNone = None
for i in range(1, len(parsing_list)):
if(parsing_list[i].span != isCheckNone):
print(parsing_list[i].find_all("td", align="center")[0].text,
parsing_list[i].find_all("td", class_="num")[0].text)
π References
- FastCampusμ Selena κ°μ¬λ κ°μ