[django] 마이그레이션 코드의 속도 개선 by Understanding QuerySet

SangHun·2021년 10월 23일

예시

먼저 django migration 파일을 예시로 보여주겠다.

Mail이라는 모델이 삭제되고, 이 모델의 데이터를 모두 Note 모델로 migrate하는 코드의 일부분이다.

# 0001_migrate_mail.py

mail_list = []

def get_old_mail(apps, _):
    Mail = apps.get_model('my_app', 'Mail')
    for old_data in Mail.objects.all():
        mail_list.append(
            {
                "sender": old_data.sender_id,
                "title": old_data.title,
                "content": old_data.content,
            }
        )

def create_note(apps, _):
    Note = apps.get_model('my_app', 'Note')
    User = apps.get_model('my_app', 'User')
    
    new_data = []
    for mail in mail_list:
        new_data.append(
            Note(
                "sender": User.objects.get(id=mail["sender"])
                "title": mail["title"],
                "content": mail["content"],
                "original_form": "mail",
            )
        )
    Note.objects.bulk_create(task_list)

class Migration(migrations.Migration):
    ...
    
    operations = [
        migrations.RunPython(get_old_mail),
        migrations.RunPython(create_note),
        ...
    ]

1. 과연 이 migration 코드는 빠를까? 빠르다면 왜 빠른가?

create_note() 함수에서 모델 레코드를 하나씩 생성하지 않고 bulk_create() API를 활용하여 빠른 편이라고 할 수 있겠다.

2. 이 코드에서 가장 시간이 많이 걸리는 부분이 어디일까?

허나 실제로 실행해보면 굉장히 느려질 수 있다.

만약 Mail 데이터가 많다면 굉장히 느려질 것이다.

그럼 어디서 느려지는가?

바로 create_note() 함수에서 Note 모델의 sender 필드 인자로 쿼리 결과를 넘겨주는 User.objects.get(id=old_data["sender"])이 실행될 때이다.

제아무리 id로 빠르게 레코드를 조회한다 하더라도 결국 반복마다 DB에 접근하게 되고, 이는 엄청난 지연을 유발한다.

3. 더 빠르게 하기 위해서는?

migration 작업에 필요한 User 데이터를 모두 한번에 DB에서 가져와서 메모리에 저장해두는 방식으로 해보자.

mail_list = []

def get_old_mail(apps, _):
    Mail = apps.get_model('my_app', 'Mail')
    for old_data in Mail.objects.all():
        mail_list.append(
            {
                "sender": old_data.sender_id,
                "title": old_data.title,
                "content": old_data.content,
            }
        )

def create_note(apps, _):
    Note = apps.get_model('my_app', 'Note')
    User = apps.get_model('my_app', 'User')
    
    user_queryset = User.objects.filter(
        id__in=[mail["sender"] for mail in mail_list]
    )
    len(user_queryset)	# Evaluate queryset.
    
    new_data = []
    for mail in mail_list:
        new_data.append(
            Note(
                "sender": user_queryset.get(id=mail["sender"])
                "title": mail["title"],
                "content": mail["content"],
                "original_form": "mail",
            )
        )
    Note.objects.bulk_create(task_list)

migration 작업에 필요한 User 데이터를 모두 한번에 가져오고 이를 user_queryset이라는 변수에 할당해두었다.

그런데 그 아래에 len(user_queryset)이라는 코드가 있다.

이 코드가 없으면, 기존 코드와 비슷한 성능이 나올 수밖에 없다.

이유는, QuerySet은 즉시 DB 쿼리를 실행하지 않는다.

그리고 QuerySet.get() API는 DB에서 하나의 레코드를 가져올 뿐 QuerySet을 모두 가져오지 않는다.