이미지 병렬 다운로드

‍이세현·2024년 1월 30일

crawling으로 이미지 url을 긁은 후에 병렬로 처리하면 훨씬 효율적으로 다운로드할 수 있다.
ThreadPoolExecutor를 사용하면 여러 스레드에서 이미지를 병렬로 다운로드할 수 있다.

주어진 URL에서 이미지를 다운로드

import concurrent.futures
import urllib.request

def download_image(url, save_name):
    try:
        request = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
        img_url = urllib.request.urlopen(request).read()
        with open(save_name, "wb") as file:
            file.write(img_url)
    except Exception as e:
        print(f"Error downloading {url}: {e}")

저장된 URLS를 불러와 downliad_image에 전달

image_urls.txt에 image_url을 저장해둔 경우 아래와 같이 원하는 경로(folder_path)에 사진들을 병렬로 저장할 수 있다.

https://example/photo1.jpg
https://example/photo2.jpg

def main():
    with open('image_urls.txt', 'r') as file:
        urls = file.read().splitlines()
        folder_path = r'/home/example/folder'

    with concurrent.futures.ThreadPoolExecutor() as executor:
        download_tasks = [(url, folder_path + f'{i}.{url.split(".")[-1]}') for i, url in enumerate(urls)]
        print(f"{len(urls)} images")
        executor.map(lambda args: download_image(*args), download_tasks)

if __name__ == "__main__":
    main()

concurrent.futures.ThreadPoolExecutor(): 스레드 풀의 생성 및 정리를 관리한다.
executor.map(lambda args: download_image(*args), download_tasks) 각 튜플에 download_image 함수를 적용하고 여러 스레드를 사용해서 이미지를 다운로드한다.

‍이세현

Hi, there 👋

이전 포스트

Color Thief

다음 포스트

이미지 병렬 다운로드

주어진 URL에서 이미지를 다운로드

저장된 URLS를 불러와 downliad_image에 전달

Color Thief

Hugging Face Dataloader

0개의 댓글

관련 채용 정보