크롤링 연습1

yun·2025년 2월 17일

TIL 크롤링

인프런-이것이 진짜 크롤링이다 [기본편]을 보고 연습했고
환경설정은 python, VScode, beautifulsoup4 이다.

✅정적 페이지 크롤링 방법

1. 데이터 받아오기

파이썬 서버에 요청을 보내고 응답받기
HTTP 통신으로 HTML을 받아오기

2. 데이터 뽑아오기 (beautiful soup4)

HTML에서 원하는 부분만 추출
CSS 선택자를 잘 만드는 것이 핵심

크롤링 연습 사이트
http://startcoding.pythonanywhere.com/basic

✅requests 실습하기

🪄선택자 가져오기

F12 개발자도구에서 태그에 별명이 있는지 확인
ctrl + F 후 .선택자 검색 후 잘 찾았는지 확인

import requests
from bs4 import BeautifulSoup

response = requests.get("https://startcoding.pythonanywhere.com/basic")
html = response.text
soup = BeautifulSoup(html, 'html.parser')
soup.select_one("선택자")

선택자 글씨가 있는 위치에 태그 별명을 넣어주면 된다.

✅크롤링 예시

🧑‍💻글자만 가져오고 싶을 때

soup.select_one(".brand-name").text

결과

'스타트코딩'

🧑‍💻속성 값에 있는 주소를 가져오고 싶을 때 (dictionary 형태)

soup.select_one(".brand-name").attrs

결과

{'class': ['brand-name'],
 'href': 'https://www.youtube.com/channel/UCHwhZ7HPBhUh2IscPSL0pHA',
 'target': '_blank'}

🧑‍💻주소만 가져오고 싶을 때

soup.select_one(".brand-name").attrs['href']

결과

'https://www.youtube.com/channel/UCHwhZ7HPBhUh2IscPSL0pHA'

🧑‍💻타겟 속성값만 가져오고 싶을 때

soup.select_one(".brand-name").attrs['target']

결과

'_blank'

🧑‍💻strip(): 앞 뒤 공백 제거, replace('변경 전 문자','변경 후 문자'): 문자열 교체

price = soup.select_one(".product-price").text.strip().replace(',','').replace('원','')

결과

🧑‍💻부모 태그 안의 자식 태그만 선택

soup.select_one(".product-name > a").attrs['href']

결과

#product1_detail.html

yun

이전 포스트

LLM 모델과 에이전트의 차이점

다음 포스트

크롤링 연습1

✅정적 페이지 크롤링 방법

1. 데이터 받아오기

2. 데이터 뽑아오기 (beautiful soup4)

✅requests 실습하기

🪄선택자 가져오기

✅크롤링 예시

🧑‍💻글자만 가져오고 싶을 때

결과

🧑‍💻속성 값에 있는 주소를 가져오고 싶을 때 (dictionary 형태)

결과

🧑‍💻주소만 가져오고 싶을 때

결과

🧑‍💻타겟 속성값만 가져오고 싶을 때

결과

🧑‍💻strip(): 앞 뒤 공백 제거, replace('변경 전 문자','변경 후 문자'): 문자열 교체

결과

🧑‍💻부모 태그 안의 자식 태그만 선택

결과

LLM 모델과 에이전트의 차이점

크롤링 연습2

0개의 댓글