장고에서 n+1 쿼리 문제 (성능이슈)

런던행·2020년 5월 26일

Django 업그레이드

목록 보기

2/17

ORM를 사용 시 장점도 있지만 개발자가 n+1 쿼리 문제를 모르고 어플리케이션 작성을 하면 성능 이슈 문제 쉽게 맞 닿을 수가 있습니다. 성능 이슈들 중에서 크게 차지하는 n+1 쿼리 문제에 대해서 이 글은 다루고자 합니다.

N+1 쿼리 문제란..

주로 어플리케이션단에서 발생하는 문제입니다. 어플리케이션에서 한번의 호출로 N개의 모델을 가져온 뒤 N개의 모델을 순회 하면서 각각 모델 관련된 릴레이션 모델에 접근 할 때, DB에 또 다시 호출하게 되는데 이때 N번 호출하게 되어 성능에 좋지 않는 영향을 끼치게 됩니다.

reference

장고에서 N+1 발생하는 예제 코드

Place모델과 Restaurant 모델

class Place(models.Model):
    name = models.CharField(max_length=50)
    address = models.CharField(max_length=80)

    def __str__(self):
        return self.name


class Restaurant(models.Model):
    place = models.OneToOneField(Place, on_delete=models.CASCADE, related_name='restaurant')
    name = models.CharField(max_length=50)
    severs_pizza = models.BooleanField(default=False)

    def __str__(self):
        return self.name

Place모델과 Restaurant모델은 일대일 관계

각각 6개만 레코드를 생성합니다. 아래는 이미 생성한 결과 화면입니다.

>>> for place in Place.objects.all():
...     print(place.restaurant.name)
... 
TestRestaruant1
TestRestaruant2
TestRestaruant3
TestRestaruant4
TestRestaruant5
TestRestaruant6

이 때 생성되는 쿼리들은 아래와 같습니다.

from django.db import connection
print(connection.queries)

[{'sql': 'SELECT @@SQL_AUTO_IS_NULL', 'time': '0.000'}, 
{'sql': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED', 'time': '0.000'},
{'sql': 'SELECT `photo_place`.`id`, `photo_place`.`name`, `photo_place`.`address` FROM `photo_place`', 'time': '0.000'},
{'sql': 'SELECT `photo_restaurant`.`id`, `photo_restaurant`.`place_id`, `photo_restaurant`.`name`, `photo_restaurant`.`severs_pizza` FROM `photo_restaurant` WHERE `photo_restaurant`.`place_id` = 1 LIMIT 21', 'time': '0.000'},
{'sql': 'SELECT `photo_restaurant`.`id`, `photo_restaurant`.`place_id`, `photo_restaurant`.`name`, `photo_restaurant`.`severs_pizza` FROM `photo_restaurant` WHERE `photo_restaurant`.`place_id` = 2 LIMIT 21', 'time': '0.000'},
{'sql': 'SELECT `photo_restaurant`.`id`, `photo_restaurant`.`place_id`, `photo_restaurant`.`name`, `photo_restaurant`.`severs_pizza` FROM `photo_restaurant` WHERE `photo_restaurant`.`place_id` = 3 LIMIT 21', 'time': '0.000'},
{'sql': 'SELECT `photo_restaurant`.`id`, `photo_restaurant`.`place_id`, `photo_restaurant`.`name`, `photo_restaurant`.`severs_pizza` FROM `photo_restaurant` WHERE `photo_restaurant`.`place_id` = 4 LIMIT 21', 'time': '0.000'},
{'sql': 'SELECT `photo_restaurant`.`id`, `photo_restaurant`.`place_id`, `photo_restaurant`.`name`, `photo_restaurant`.`severs_pizza` FROM `photo_restaurant` WHERE `photo_restaurant`.`place_id` = 5 LIMIT 21', 'time': '0.001'},
{'sql': 'SELECT `photo_restaurant`.`id`, `photo_restaurant`.`place_id`, `photo_restaurant`.`name`, `photo_restaurant`.`severs_pizza` FROM `photo_restaurant` WHERE `photo_restaurant`.`place_id` = 6 LIMIT 21', 'time': '0.000'}]

for문으로 6번 호출하면 photo_restaurant 테이블 select 연산도 6번 호출되는걸 확인 할수 있습니다.
만약 for문이 10000번 호출하면 photo_restaurant 테이블 select 연산도 10000번 호출 됩니다. 호출 수가 백만번이라면 응답시간은.. 헬게이트가 열리겠죠?? 이런 상황을 릴레이션 N+1 문제라고 합니다.

이 때 eager 로딩으로 하면 (prefetch_related)

>>> for place in Place.objects.prefetch_related('restaurant').all():
...     print(place.restaurant.name)
... 
TestRestaruant1
TestRestaruant2
TestRestaruant3
TestRestaruant4
TestRestaruant5
TestRestaruant6

가시적으로 확인 했을 때 prefetch_related 적용한거랑 안 했을 떄 결과은 같지만 prefetch_related를 사용 시 생성되는 쿼리는 아래와 같습니다.

[{'sql': 'SELECT @@SQL_AUTO_IS_NULL', 'time': '0.000'},
{'sql': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED', 'time': '0.000'},
{'sql': 'SELECT `photo_place`.`id`, `photo_place`.`name`, `photo_place`.`address` FROM `photo_place`', 'time': '0.000'},
{'sql': 'SELECT `photo_restaurant`.`id`, `photo_restaurant`.`place_id`, `photo_restaurant`.`name`, `photo_restaurant`.`severs_pizza` FROM `photo_restaurant` WHERE `photo_restaurant`.`place_id` IN (1, 2, 3, 4, 5, 6)', 'time': '0.000'}]

prefetch_related 사용 안할 때 6번 호출되는 쿼리가 prefetch_related 사용을 하면 IN (1, 2, 3, 4, 5, 6) 방식으로 대체되면서 1번 호출하는걸로 수정되어 쿼리 수를 대폭 줄일 수 있습니다.

런던행

unit test, tdd, bdd, laravel, django, android native, vuejs, react, embedded linux, typescript