Scrapy Splash Guide

우진·2022년 12월 16일

scraper

목록 보기
2/2

https://scrapeops.io/python-scrapy-playbook/scrapy-splash/

install & Run Scrapy Splash

1. docker pull scrapinghub/splash

2. Run Scrapy Splash


docker run -it -p 8050:8050 --rm scrapinghub/splash

Use Scrapy Splash With Our Spiders

예제 Clone


git clone https://github.com/python-scrapy-playbook/quotes-js-project.git

1. Set up Scrapy Splash Integration


pip install scrapy-splash

# settings.py

# Splash Server Endpoint
SPLASH_URL = 'http://localhost:8050'


# Enable Splash downloader middleware and change HttpCompressionMiddleware priority
DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

# Enable Splash Deduplicate Args Filter
SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

# Define the Splash DupeFilter
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
  • settings.py에 집어넣으면 그만.

스크래퍼 런 명령어


scrapy crawl "name"

0개의 댓글