HTML | 헷갈렸던 html 요소 불러오는 방법 정리 (select, find)

소리·2023년 10월 16일

beautifulsoup html urllib

제로베이스 데이터분석 공부

목록 보기

39/84

Beautiful Soup로 찾기

from bs4 import beautifulsoup

(1) find / find_all : 태그 요소와 속성을 중점으로 찾을 수 있다.

find(tag, attributes, recursive, text, keywords)
: tag 타입으로 결과 값 반환
find_all(tag, attributes, recursive, text, limit, keywords)
: Resultset 타입으로 결과 값 반환

cartoons = soup.find_all('a', attrs={'class':'title'})

#동일한 결과 값 나옴
soup.find('div', {'class' : 'g2b'}).text
soup.find('div', class_ = 'g2b').text

find 함수는 기본적으로 find_all 함수 뿐만 아니라 find_all_previous, find_previous, find_all_next, find_next, find_previous_siblings, find_previous_sibling, find_next_siblings, find_next_sibling, find_parents, find_parent 로 다양하게 지원

(2) select / select_one : CSS selector 방식으로 요소들을 찾을 수 있다.

select_one : tag 타입으로 결과 값 반환
select : list 타입으로 결과 값 반환

cartoons = soup.select('a.title')

select 문법 세부 내용

soup.select('.class명') .은 html 용어로 클래스를 말함
soup.select('#id명')
soup.select('태그명') html 태그에 아무 것도 안 붙여도 가능

예)

soup.select('Inl#sheet')  #Inl이 태그, sheet가 아이디

soup.select('.wala .dodo em')[0].text #wala클래스 안에 있는 dodo 클래스의 태그명 em을 찾아라
#띄어쓰기는 ~안에라는 뜻

🔎 같은 결과 다른 문법

soup.find_all('tag')[0].text
== soup.select('tag')[0].text

soup.find_all(class_='name')[0].text
== soup.select('.name')[0].text

soup.find_all(id='name')[0].text
== soup.select('#name')[0].text

텍스트 추출

.text 텍스트 추출
.get_text 지정 태그를 포함한 모든 하위 태그를 제거하고 텍스트 문자열 반환 (마지막태그에 사용할 것)

Selenium에서 찾기

find_element() 특정 HTML 요소를 찾아서 가져오고 싶을 때 사용한다.

from selenium.webdriver.common.by import By By 클래스는 한 페이지에서 특정한 요소의 위치를 파악하는데 쓰이는데, id/ name/ xpath/ link_Test /tag_name /class_name /css_selector /partial_link_text를 쓸 수 있다.

class, name 태그로 찾을 수 없을 때 css_selector을 쓰는 경우가 많다.
다양한 속성, 텍스트, 태그 이름을 이용하는 경우 .(점)을 이용한다.
.text .get_attribute('속성이름') .tag_name

소리

데이터로 경로를 탐색합니다.

이전 포스트

HTML | 웹 크롤링 BeautifulSoup와 Requests 무슨 차이야?

다음 포스트