TIL) 데브코스 9일차 - Selenium

Pori·2023년 10월 26일

TIL 데이터 엔지니어링 데브코스 웹스크래핑&크롤링

데엔

목록 보기

4/47

Selenium 라이브러리

: selenium은 Python을 이용해서 웹 브라우저를 조작할 수 있는 자동화 프레임워크이다.

설치

: 라이브러리 설치와 Web Driver를 미리 설치해두고 사용한다. 버전은 4.14.0를 사용하였다.

# 주피터 환경에서 설치
%pip install selenium
%pip install webdriver-manager
# 아나콘다에 설치
conda install -c conda-forge selenium

크롬창 띄우기

WebDriver 모듈을 이용하여 크롬창을 띄우는 방법이다.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
# 다음 코드를 통해서 드라이버를 같이 불러온다.
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("http://www.example.com")

# with-as 버전
with webdriver.Chrome(service=Service(ChromeDriverManager().install())) as driver:
    driver.get("http://www.example.com")

요소 찾기 find_element

: By와 find_element를 활용하여 페이지 내 요소를 찾을 수 있다.

.find_element(by, target) : 하나
.find_elements(by, target) : 여러개

# p태그를 찾는 예시
with webdriver.Chrome(service=Service(ChromeDriverManager().install())) as driver:
    driver.get("http://www.example.com")
    print(driver.find_element(By.TAG_NAME, "p").text)

Wait

: 동적 페이지를 스크래핑하기 위해서는 페이지 로딩시간을 기다리는 것이 필요한 경우가 존재한다.

Implicit Wait : 암시적 기다림, 로딩이 다 될 때까지의 한계 시간을 의미한다. driver.implicitly_wait(5)
Explicit Wait : 명시적 기다림, until 메서드를 활용해서 target 요소가 존재할 때 까지 기다린 후 다음 명령을 수행한다.

from selenium.webdriver.support import expected_conditions as EC
# 요소가 존재하면 그 요소를 반환한다.
element = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'')))

이벤트 처리 (마우스,키보드)

: ActionChains를 활용하여 마우스와 키보드 입력과 같은 동작을 수행할 수 있다.

from selenium.webdriver.common.actions.action_builder import ActionBuilder
from selenium.webdriver import Keys, ActionChains

# 버튼 클릭
button = driver.find_element(By.XPATH,'')
ActionChains(driver).click(button).perform()

# input 요소에 값 전달.
text_input = driver.find_element(By.XPATH,'')
ActionChains(driver).send_keys_to_element(text_input, "input_text").perform()

공부 한 내용

selenium실습

새롭게 배운 내용

과거에는 크롬드라이버를 직접 받아서 사용했는데 이제는 코드를 통해 install을 지원하게 되는 것을 알게됨.

느낀점&참고

: 셀레니움이 많이 업데이트 되어서 편리해졌다. 최신 버전의 강의를 듣고 환기해보는 시간이 되었다.

Pori

이전 포스트

TIL) 데브코스 8일차 - bs4

다음 포스트