TIL 191120

김상훈·2019년 11월 20일

Web Scraping with Python

거의 실패...

# 사용 코드

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('http://www.inven.co.kr/board/lostark/5353/172087')
temp = driver.find_element_by_class_name('comment')

print(temp.text)

겪은 문제들
1. (selenium & phantomjs 사용 전)
자바스크립트로 렌더링된 페이지를 크롤링하려면
Beautiful Soup의 기능만으로는 부족한 것 같다.
그러다 selenium과 phantomjs를 사용하는 reference를 봤다.
(https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python)

(selenium & phantomjs 사용 후)
다음과 같은 에러가 발생.
Message: Error - Unable to load Atom 'find_element' from file ':/ghostdriver/./third_party/webdriver-atoms/find_element.js'
apt-get으로 phantomjs를 받을 경우 발생하는 에러라고 함.
(https://stackoverflow.com/questions/36770303/unable-to-load-atom-find-element)
(phantomjs 다시 받은 후)
다음과 같은 에러 발생.
UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless
해당 에러는 이 이전에도 계속 발생하긴 했었음.
문제는 2번 과정을 했음에도 불구하고 크롤링이 안됨.
실행 자체는 되나, 글자가 하나도 넘어오지 않음.
그래서 에러를 확인하게 됨.
해당 에러는 chrome이나 firefox를 headless로 사용해야 한다는 뜻임.
(https://stackoverflow.com/questions/50416538/python-phantomjs-says-i-am-not-using-headless)

휴...피곤하구먼

김상훈

남과 비교하지 말자.

이전 포스트

TIP 191118

다음 포스트

TIL 191120

Web Scraping with Python

TIP 191118

TIL 191122

0개의 댓글