K-디지털트레이닝(빅데이터) 8일차

유현민·2021년 8월 4일

기업 맞춤형 빅데이터 분석가 양성과정 (한국품질재단 평생교육시설) 21.07.26 - 22.01.29

목록 보기

8/71

오늘은 크롤링에 관해서 배웠다. 나도 처음 해보는거라 익숙하지 않아서 많이 해맸다. 하지만 하다보니 익숙해져서 재밌었다.

방법

selenium설치

pip install selenium

크롬 드라이버 다운로드
버전 확인후에 다운로드 해야한다. 하위버전 드라이버를 다운하는것은 상관없지만 상위버전 드라이버는 작동 안함
3.확인

from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('URL')
time.sleep(3)  #로드 되기전에 입력되지 않도록 방지
driver.close()

크롤링할거 찾기
f12로 개발자 도구로 들어가서 클래스나 태그를 확인한다.

e = driver.fine_elements_by_class_name() #element로 하면 안찾아진다.

만약 주소 변경하고 싶으면...
driver.get('URL')

beautifulsoup를 설치해야함

pip install beautifulsoup4

네이버로 강아지 검색해서 가져오기

from selenium import webdriver
# from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from urllib.parse import quote_plus #한글 처리를 위해서
import time


# baseUrl = 'https://www.google.com/search?q='
baseUrl = 'https://search.naver.com/search.naver?where=nexearch&sm=top_hty&fbm=0&ie=utf8&query='
plusUrl = input('검색어를 입력하세요. : ')


url = baseUrl + quote_plus(plusUrl)#quote_plus 꼭 써야함

print(url)

driver = webdriver.Chrome()
driver.get(url)

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser') #html을 잘게잘라서 담겠다

f = open("a_text.txt", 'w')


# titleLists = soup.select('h3')

#
# for title in titleLists:
#     data = title.text + "\n"
#     f.write(data)

titleLists = soup.select('.api_txt_lines')
for title in titleLists:
    print(title.text)
    print(title.get('href'))


f.close()

유현민

smilegate

이전 포스트

K-디지털트레이닝(빅데이터) 7일차

다음 포스트

K-디지털트레이닝(빅데이터) 8일차

기업 맞춤형 빅데이터 분석가 양성과정 (한국품질재단 평생교육시설) 21.07.26 - 22.01.29

방법

K-디지털트레이닝(빅데이터) 7일차

K-디지털트레이닝(빅데이터) 9일차

0개의 댓글

관련 채용 정보