(Package) Selenium

์ž„๊ฒฝ๋ฏผยท2023๋…„ 10์›” 28์ผ
1
post-thumbnail

Selenium



์ฃผ์š” ๊ธฐ๋Šฅ


  • ์›น ๋ธŒ๋ผ์šฐ์ €๋ฅผ ์›๊ฒฉ ์กฐ์ž‘ํ•˜๋Š” ๋„๊ตฌ
  • ์ž๋™์œผ๋กœ URL์„ ์—ด๊ณ  ํด๋ฆญ ๋“ฑ์ด ๊ฐ€๋Šฅ
  • ์Šคํฌ๋กค, ๋ฌธ์ž์˜ ์ž…๋ ฅ, ํ™”๋ฉด ์บก์ณ ๋“ฑ

Beautiful Soup๋งŒ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†๋Š” ๊ฒƒ

  • ์ ‘๊ทผํ•  ์›น ์ฃผ์†Œ๋ฅผ ์•Œ ์ˆ˜ ์—†์„ ๋•Œ
  • ์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์›น ํŽ˜์ด์ง€์˜ ๊ฒฝ์šฐ
  • ์›น ๋ธŒ๋ผ์šฐ์ €๋กœ ์ ‘๊ทผํ•˜์ง€ ์•Š์œผ๋ฉด ์•ˆ๋  ๋•Œ

Install Package


  • ์œˆ๋„์šฐ, Mac(intel)
conda install selenium
  • Mac(m1)
pip install selenium

Selenium ์„ธํŒ…


Chrome ๋ฒ„์ „ ํ™•์ธ


  1. ํฌ๋กฌ ์˜ค๋ฅธ์ชฝ ์ƒ๋‹จ [๋งž์ถค ์„ค์ • ๋ฐ ์ œ์–ด] - [๋„์›€๋ง] - [Chrome ์ •๋ณด]

  1. Chrome ์ •๋ณด ํ™•์ธ

  1. Google์—์„œ Chromedriver ๊ฒ€์ƒ‰
  • ๋งํฌ : https://chromedriver.chromium.org/downloads
  • ํฌ๋กฌ ๋ฒ„์ „ ์•ž ๋ถ€๋ถ„ ์ˆซ์ž์— ๋งž๋Š” ๊ฒƒ ์ค‘ ์„ ํƒ
    • ์‚ฌ์šฉ์ค‘์ธ OS ๋ฒ„์ „์— ๋งž์ถฐ ๋‹ค์šด๋กœ๋“œ


## Step 1. Selenium webdriver ์‚ฌ์šฉํ•˜๊ธฐ ---

Module Load ๋ฐ ํฌ๋กฌ๋“œ๋ผ์ด๋ฒ„ ๊ฒฝ๋กœ ์ง€์ •

โ€ป Selenium Update๋กœ ์ธํ•ด webdriver.Chrome() ํฌ๋กฌ๋“œ๋ผ์ด๋ฒ„ ๊ฒฝ๋กœ์ง€์ • ํ•„์š”์—†์Œ (์„ค์น˜๋„ X)

from selenium import webdriver

driver = webdriver.Chrome() # ํฌ๋กฌ ๋“œ๋ผ์ด๋ฒ„ ๊ฒฝ๋กœ ์ง€์ • 
driver.get("https://www.naver.com") # get ๋ช…๋ น์œผ๋กœ ์ ‘๊ทผํ•˜๊ณ  ์‹ถ์€ ์ฃผ์†Œ ์ง€์ •
  • ๋ฆฌ์†Œ์Šค ๋‚ญ๋น„ํ•˜์ง€ ์•Š๊ฒŒ driver.quit() ๋ช…๋ น์–ด ์‚ฌ์šฉ
driver.quit()
  • ์ƒˆ๋กœ์šด ํฌ๋กฌ์ด ๋‚˜ํƒ€๋‚˜๋ฉด์„œ ์ง€์ •๋œ ์›น ์ฃผ์†Œ์— ์ ‘๊ทผ

  • ํ˜„์žฌ ๋ธŒ๋ผ์šฐ์ € ์ฐฝ ํฌ๊ธฐ ํ™•์ธ
# ํ˜„์žฌ ๋ธŒ๋ผ์šฐ์ € ์ฐฝ ํฌ๊ธฐ 
driver.get_window_size()

2. ํ™”๋ฉด ์Šคํฌ๋กค


  • ์Šคํฌ๋กค ๊ฐ€๋Šฅํ•œ ๋†’์ด(๊ธธ์ด) ๊ฐ€์ ธ์˜ค๊ธฐ
# ์Šคํฌ๋กค ๊ฐ€๋Šฅํ•œ ๋†’์ด(๊ธธ์ด) # ์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ ์ฝ”๋“œ ์‹คํ–‰
last_height = driver.execute_script("return document.body.scrollHeight")
last_height

  • ํ™”๋ฉด ์Šคํฌ๋กค : ํ•˜๋‹จ ์ด๋™
# ํ™”๋ฉด ์Šคํฌ๋กค 
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

  • ํ™”๋ฉด ์Šคํฌ๋ฆฐ์ƒท
# ์Šคํฌ๋ฆฐ์ƒท
driver.save_screenshot("./last_height.png")

  • ํ™”๋ฉด ์Šคํฌ๋กค : ์ƒ๋‹จ ์ด๋™
# ์Šคํฌ๋กค ์ƒ๋‹จ ์ด๋™ 
driver.execute_script("window.scrollTo(0, 0);")

Selenium ํƒœ๊ทธ ๋ช…๋ น์–ด

find_element_by_css_selector => find, select_one

find_elements_by_css_selector => find_all, select


  • ํŠน์ • ํƒœ๊ทธ ์ง€์ ๊นŒ์ง€ ์Šคํฌ๋กค ์ด๋™
# ํŠน์ • ํƒœ๊ทธ ์ง€์ ๊นŒ์ง€ ์Šคํฌ๋กค ํ•˜๋Š” ์ฝ”๋“œ 

from selenium.webdriver import ActionChains

# some_tag = driver.find_element_by_xpath('//*[@id="paging"]/ul')
some_tag = driver.find_element_by_css_selector("#paging > ul") 
action = ActionChains(driver)
action.move_to_element(some_tag).perform()

Step 3. ๊ฒ€์ƒ‰์–ด ์ž…๋ ฅ


  • ์ž…๋ ฅ ์ฐฝ์— ๊ธ€์ž ๋„ฃ๊ธฐ
    โ€ป ํ˜„์žฌ ํ™”๋ฉด์— ๋ณด์ด์ง€ ์•Š์œผ๋ฉด ์ž…๋ ฅ X
# ์ž…๋ ฅ ์ฐฝ์— ๊ธ€์ž ๋„ฃ๊ธฐ 
# ํ™”๋ฉด์— ๋ณด์ด์ง€ ์•Š์œผ๋ฉด ์ž…๋ ฅํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค

some_tag = driver.find_element_by_id("gsc-i-id1")
some_tag.send_keys("data science")

  • ๋ธŒ๋ผ์šฐ์ € ์ฐฝ ํฌ๊ธฐ ์กฐ์ ˆ
    โ€ป ํ˜„์žฌ ๋ณด์ด๋Š” ํ™”๋ฉด์—์„œ๋งŒ ๊ฐ€๋Šฅ
# ๋ธŒ๋ผ์šฐ์ € ์ฐฝ ํฌ๊ธฐ ์กฐ์ ˆ 
# ํ˜„์žฌ ๋ณด์ด๋Š” ํ™”๋ฉด์—์„œ๋งŒ ์•ก์…˜์„ ์ทจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
driver.set_window_size(1920, 1080)

  • ํ™”๋ฉด ์ตœ๋Œ€ํ™”
# ํ™”๋ฉด ์ตœ๋Œ€ํ™” 
driver.maximize_window()

  • ํ™”๋ฉด ์ตœ์†Œํ™”
# ํ™”๋ฉด ์ตœ์†Œํ™” 
driver.minimize_window()

  • ์ƒˆ๋กœ ์ž…๋ ฅ ์‹œ, ๋’ค์— ์ถ”๊ฐ€๋กœ ์ž…๋ ฅ
# ์ƒˆ๋กœ ์ž…๋ ฅํ•˜๋ฉด ๋’ค์— ์ถ”๊ฐ€๋กœ ๋ถ™์Œ 
some_tag.send_keys("python")

  • ์ดˆ๊ธฐํ™” ํ›„ ๊ฒ€์ƒ‰์–ด ์ž…๋ ฅ
# ์ดˆ๊ธฐํ™” ํ›„ ๊ฒ€์ƒ‰์–ด ์ž…๋ ฅ

some_tag.clear() # ์ดˆ๊ธฐํ™”
some_tag.send_keys("python")

Xpath


Xpath ํƒœ๊ทธ ์ฐพ๋Š” ๋ฐฉ๋ฒ•

  • //: ์ตœ์ƒ์œ„
  • *: ์ž์† ํƒœ๊ทธ ๊ฒ€์ƒ‰ => div form
  • /: ์ž์‹ ํƒœ๊ทธ ๊ฒ€์ƒ‰ => div > form
  • 'td[2]': td ์ค‘์—์„œ 2๋ฒˆ์งธ ํƒœ๊ทธ

  • Xpath ์ด์šฉ
    โ€ป BeautifulSoup์—์„œ๋Š” Xpath ์‚ฌ์šฉ ๋ถˆ๊ฐ€๋Šฅ, Selenium์—์„œ๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅ
# xpath 
xpath = '//*[@id="___gcse_0"]/div/form/table/tbody/tr/td[2]/button'
some_tag = driver.find_element_by_xpath(xpath).click()

  • css_selector ์ด์šฉ
# css_selector 
css_selector = "#___gcse_0 > div > form > table > tbody > tr > td:nth-child(2) > button"
some_tag = driver.find_element_by_css_selector(css_selector).click()

  • ํ˜„์žฌ ํ™”๋ฉด์˜ html ์ฝ”๋“œ ๊ฐ€์ ธ์˜ค๊ธฐ
# ํ˜„์žฌ ํ™”๋ฉด์˜ html ์ฝ”๋“œ ๊ฐ€์ ธ์˜ค๊ธฐ 
driver.page_source

0๊ฐœ์˜ ๋Œ“๊ธ€