똑똑한 HTML 분석기 - BeautifulSoup4 : 2-5. 원하는 요소 가져오기 II

임동윤·2022년 9월 27일

beautifulsoup python 웹 스크래핑 프로그래머스 인공지능 데브코스

웹 스크래핑 기초

목록 보기

10/20

Hashcode 질문 가져오기

User-Agent를 활용한 request

User-Agent를 설정한다

user_agent = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}

User-Agent를 포함하여 요청을 진행한다.
res = requests.get("https://hashcode.co.kr/", user_agent)

응답을 바탕으로 BeautifulSoup 객체를 생성한다.
soup = BeautifulSoup(res.text, "html.parser")

질문의 제목을 모아서 출력합니다.

questions = soup.find_all("li", "question-list-item")

for question in questions:
    print(question.find("div", "question").find("div","top").h4.text)

페이지 네이션 (Pagination)

페이지네이션은 많은 정보를 인덱스로 구분하는 기법입니다.
해당 사이트는 Query String을 통해서 이를 구분합니다.

Pagination이 되어있는 질문 리스트의 제목을 모두 가져온다.
과도한 요청을 방지하기 위해 1초마다 요청을 보내봅시다.

import time

for i in range(1,5):
    res = requests.get("https://hashcode.co.kr/?page={}".format(i), user_agent)
    soup = BeautifulSoup(res.text, "html.parser")
   
    questions = soup.find_all("li", "question-list-item")
    for question in questions[:5]:
        print(question.find("div", "question").find("div","top").h4.text)
    print("\n")
    time.sleep(0.5)

업로드중..

임동윤

AI Tensorflow Python

이전 포스트

똑똑한 HTML 분석기 - BeautifulSoup4 : 2-4. HTML의 Locator로 원하는 요소 찾기

다음 포스트

똑똑한 HTML 분석기 - BeautifulSoup4 : 2-5. 원하는 요소 가져오기 II

웹 스크래핑 기초

Hashcode 질문 가져오기

User-Agent를 활용한 request

똑똑한 HTML 분석기 - BeautifulSoup4 : 2-4. HTML의 Locator로 원하는 요소 찾기

웹 브라우저 자동화 - Selenium : 3-1. 동적 웹 페이지와의 만남

0개의 댓글