[JPA] N+1 query 문제

RID·2024년 6월 28일

JPA 이해하기

목록 보기

4/4

배경

JPA를 사용해서 연관관계를 맺게되면 필연적으로 겪게되는 문제가 있다.
바로 오늘 이야기할 N+1 쿼리 문제이다.

연관관계에 있는 데이터를 조회할 때 예상했던 것 보다 더 많은 수의 쿼리가 발생하는 문제로 예상치 못한 성능 이슈를 가져오기 때문에 반드시 고민해봐야할 문제이다!

사실 N+1 이라는 식만 보면 N번 쿼리에 기껏해야 한 번 추가되는 거 아니야? 라는 생각이 들 수 있다. N번에 상수 개수 추가되는 거 정도야 별 큰 타격이 없겠지 싶겠지만 실상은 N+1이라기 보다 1+N 문제라고 생각하면 편할 것 같다.

실제 N+1 문제는 1번의 쿼리로 해결 될 조회가 N개의 추가 쿼리를 발생시키는 문제다!

이는 @OneToMany 와 @ManyToOne 연관관계 모두 발생하며 문제 상황을 살펴보기 위해 아래와 같이 두 개의 Entity를 생성해보자.

Todo.kt

@Entity
@Table(name = "todo")
class Todo(

    @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = "todo_id")
    var id: Long? = null,

    var contents: String,
    val date: LocalDate = LocalDate.now(),
    var isComplete: Boolean = false,

		@OneToMany(mappedBy = "todo", fetch = FetchType.EAGER)
    val comments: MutableList<Comment> = mutableListOf()
)

Comment.kt

import jakarta.persistence.*

@Entity
@Table(name = "comment")
class Comment(

    @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = "comment_id")
    var id: Long? = null,

    @ManyToOne(fetch = FetchType.EAGER)
    @JoinColumn(name = "todo_id")
    val todo: Todo,
    val contents: String
)

양방향 연관관계로 맺었다고 해서 양방향일때만 생기는 문제 아니야? 라고 생각할 수 있다. 하지만 이는 @OneToMany 와 @ManyToOne 두 가지 상황 모두에서 발생함을 보여주기 위함이고 단방향 연관관계에서도 발생한다!

N+1 문제 상황

문제 상황을 위해 아래와 ApplicationRunner를 활용해 사전에 데이터를 넣어두었다.

DatabaseInitializer.kt


@Component
class DatabaseInitializer(
    private val todoRepository: TodoRepository,
    private val commentRepository: CommentRepository
) : ApplicationRunner {
    override fun run(args: ApplicationArguments) {
        val todo1 = todoRepository.save(Todo(contents = "할일1"))
        val todo2 = todoRepository.save(Todo(contents = "할일2"))
        val todo3 = todoRepository.save(Todo(contents = "할일3"))
        val todo4 = todoRepository.save(Todo(contents = "할일4"))

        commentRepository.save(Comment(todo = todo1, contents = "할일1- 댓글1"))
        commentRepository.save(Comment(todo = todo1, contents = "할일1- 댓글2"))
        commentRepository.save(Comment(todo = todo1, contents = "할일1- 댓글3"))

        commentRepository.save(Comment(todo = todo2, contents = "할일2- 댓글1"))

        commentRepository.save(Comment(todo = todo3, contents = "할일3- 댓글1"))

        commentRepository.save(Comment(todo = todo4, contents = "할일4- 댓글1"))
        commentRepository.save(Comment(todo = todo4, contents = "할일4- 댓글2"))
    }
}

이제 모든 Todo를 조회하는 과정을 생각해보자. JPA의 도움을 받지 않고 데이터베이스에서 연관된 Comment와 함께 Todo를 모두 조회한다면 어떤 SQL 쿼리를 사용할까?

아마도 아래와 같이 join을 통해서 한 번에 모든 데이터를 가져올 것이다!

SELECT * FROM todo t JOIN comment c ON t.todo_id = c.todo_id

JPA도 충분히 똑똑해서 이렇게 쿼리를 자동으로 만들어서 날려주면 정말 좋겠지만 실제로는 아래와 같은 절차를 거쳐 연관된 두 개의 데이터를 가져온다.

Join을 사용하지 않고 Todo만을 가져오는 쿼리 : SELECT * FROM TODO
가져온 todo 데이터의 id를 모두(N개) 확인하여 각 id를 FK로 갖는 Comment를 Todo에서 가져오기
- SELECT * FROM comment WHERE todo_id = 1
- SELECT * FROM comment WHERE todo_id = 2
  ~
- SELECT * FROM comment WHERE todo_id = N-1
- SELECT * FROM comment WHERE todo_id = N
그렇기 때문에 첫 번째 조회 쿼리의 결과로 가져온 Todo의 개수 N개 만큼 추가 쿼리가 발생한다!

이정도는 괜찮다고 생각이 들수도 있지만, 만약 Comment가 또 다른 Entity와 연관관계가 맺어져 있다고 생각해보자. 아주 끔찍한 개수의 쿼리가 발생할 것이다.

N+1 문제 확인하기

이제 @OneToMany, @ManyToOne 각각의 상황에서 N+1 문제가 발생하는지 살펴보자.

OneToMany

todoRepository.findAll() 함수를 호출하면서 Todo 리스트를 가져올 때 1번의 쿼리면 사실 충분하다. 하지만 실제 실행해보면 아래와 같이 쿼리가 발생한다.

Hibernate: select t1_0.todo_id,t1_0.contents,t1_0.date,t1_0.is_complete from todo t1_0
Hibernate: select c1_0.todo_id,c1_0.comment_id,c1_0.contents from comment c1_0 where c1_0.todo_id=?
2024-06-28T19:36:35.447+09:00 TRACE 34356 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [4]
Hibernate: select c1_0.todo_id,c1_0.comment_id,c1_0.contents from comment c1_0 where c1_0.todo_id=?
2024-06-28T19:36:35.448+09:00 TRACE 34356 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [3]
Hibernate: select c1_0.todo_id,c1_0.comment_id,c1_0.contents from comment c1_0 where c1_0.todo_id=?
2024-06-28T19:36:35.448+09:00 TRACE 34356 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [2]
Hibernate: select c1_0.todo_id,c1_0.comment_id,c1_0.contents from comment c1_0 where c1_0.todo_id=?
2024-06-28T19:36:35.449+09:00 TRACE 34356 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [1]

위에서 언급했던 대로 Todo를 가져오는 쿼리 한번과 각각의 todo_id를 통해서 연관된Comment를 따로따로 가져온다.

ManyToOne

Comment를 먼저 조회할 때도 마찬가지이다.

Hibernate: select c1_0.comment_id,c1_0.contents,c1_0.todo_id from comment c1_0
Hibernate: select t1_0.todo_id,t1_0.contents,t1_0.date,t1_0.is_complete,c1_0.todo_id,c1_0.comment_id,c1_0.contents from todo t1_0 left join comment c1_0 on t1_0.todo_id=c1_0.todo_id where t1_0.todo_id=?
2024-06-28T19:38:24.702+09:00 TRACE 34356 --- [nio-8080-exec-5] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [1]
Hibernate: select t1_0.todo_id,t1_0.contents,t1_0.date,t1_0.is_complete,c1_0.todo_id,c1_0.comment_id,c1_0.contents from todo t1_0 left join comment c1_0 on t1_0.todo_id=c1_0.todo_id where t1_0.todo_id=?
2024-06-28T19:38:24.703+09:00 TRACE 34356 --- [nio-8080-exec-5] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [2]
Hibernate: select t1_0.todo_id,t1_0.contents,t1_0.date,t1_0.is_complete,c1_0.todo_id,c1_0.comment_id,c1_0.contents from todo t1_0 left join comment c1_0 on t1_0.todo_id=c1_0.todo_id where t1_0.todo_id=?
2024-06-28T19:38:24.704+09:00 TRACE 34356 --- [nio-8080-exec-5] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [3]
Hibernate: select t1_0.todo_id,t1_0.contents,t1_0.date,t1_0.is_complete,c1_0.todo_id,c1_0.comment_id,c1_0.contents from todo t1_0 left join comment c1_0 on t1_0.todo_id=c1_0.todo_id where t1_0.todo_id=?
2024-06-28T19:38:24.704+09:00 TRACE 34356 --- [nio-8080-exec-5] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [4]

Comment를 전체 조회하고나서, FK Column에 들어있는 todo_id를 모두 가져온 다음 개별적으로 해당 todo_id로 Todo를 가져오는 쿼리를 따로따로 날리게 된다.

어떻게 해결해??

이제 문제 상황을 확인했으니 해결해야한다! JPA가 자동으로 만들어주는 쿼리에 의해서 생긴 문제이기 때문에 JPA 내부적으로 이를 해결할 수 있는 방법을 마련해두었다.

1. Batch Size 설정하기

해당 방법은 근본적인 해결은 아니지만 N개에 해당하는 쿼리의 개수를 N/batch_size+1 만큼의 개수로 줄일 수 있다.

첫 번째 문제 상황 예시에서 Todo 조회로 얻게된 todo_id는 총 1,2,3,4 4개였다. 이 4개의 todo_id를 한 번에 조건문에 넣어 하나의 쿼리로 최적화하는 방식이다.

일단 먼저 설정 방법부터 알아보자. application.yml에 아래와 같이 옵션을 추가하기만 하면 된다. (size는 알아서 조절하자! 이번엔 예시를 위해 10개로 설정했다)

 properties:
      hibernate:
        default_batch_fetch_size: 10

이제 결과 쿼리를 살펴보자!

Hibernate: select t1_0.todo_id,t1_0.contents,t1_0.date,t1_0.is_complete from todo t1_0
Hibernate: select c1_0.todo_id,c1_0.comment_id,c1_0.contents from comment c1_0 where c1_0.todo_id in (?,?,?,?,?,?,?,?,?,?)
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (1:BIGINT) <- [4]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (2:BIGINT) <- [1]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (3:BIGINT) <- [2]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (4:BIGINT) <- [3]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (5:BIGINT) <- [null]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (6:BIGINT) <- [null]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (7:BIGINT) <- [null]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (8:BIGINT) <- [null]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (9:BIGINT) <- [null]
2024-06-28T19:45:31.220+09:00 TRACE 13300 --- [nio-8080-exec-1] org.hibernate.orm.jdbc.bind              : binding parameter (10:BIGINT) <- [null]

1+N에서 N에 해당하는 쿼리의 개수를 1(4/10 +1)개로 줄어들게 되었다!

사실 batch size를 이용한 방법은 쿼리를 최적화해서 날려주는 것이지 근본적으로 N+1 문제를 해결하는 방법은 아니다.

2. Fetch join

JPA에서는 DB에서는 없는 join인 fetch join을 제공한다. 이는 사실 JPA에게 연관관계에 있는 Entity 정보를 미리 알려주고 우리가 원했던 join 쿼리를 통해 조회하도록 도와주는 옵션이라고 생각하면 좋을 것 같다.

기존에 repository에서 자동으로 제공하는 findAll() 메소드가 아닌, 아래와 같이JPQL로 작성한 메소드를 사용해보자.

@Repository
interface TodoRepository : JpaRepository<Todo, Long> {

    @Query("SELECT todo FROM Todo todo JOIN FETCH todo.comments")
    fun findAllWithFetchJoin(): List<Todo>
}

Hibernate: select t1_0.todo_id,c1_0.todo_id,c1_0.comment_id,c1_0.contents,t1_0.contents,t1_0.date,t1_0.is_complete from todo t1_0 join comment c1_0 on t1_0.todo_id=c1_0.todo_id

이제는 동일한 조회 요건에 대해서 join을 통해 단 하나의 쿼리로 조회가 완료되었다!

3. Entity Graph

Fetch join 처럼 JPA가 SQL 쿼리를 생성하기 이전에 연관되어 있는 Entity 정보를 전달하는 방식 중 하나이다.

@Repository
interface TodoRepository : JpaRepository<Todo, Long> {

	@EntityGraph(attributePaths = ["comments"])
    @Query("SELECT todo FROM Todo todo")
    fun findAllWithEntityGraph(): List<Todo>
}

@EntityGraph를 사용해 Comment가 연관관계에 있는 Entity라는 것을 알려주게 되고, 이 방식 역시 join을 이용한 1개의 SELECT 쿼리만 발생한다!

Hibernate: select t1_0.todo_id,c1_0.todo_id,c1_0.comment_id,c1_0.contents,t1_0.contents,t1_0.date,t1_0.is_complete from todo t1_0 left join comment c1_0 on t1_0.todo_id=c1_0.todo_id

쿼리를 살펴보면 Entity Graph를 통한 쿼리는 Outer Join이 발생하게 된다. OuterJoin의 경우 중복 데이터가 발생할 문제가 있기 때문에 주의해서 사용해야 하고, 연관관계가 많아질 경우 Entity Graph의 복잡도가 증가하여 fetch join을 사용하여 해결하는 것이 조금 더 좋은 방법인 것 같다!

주의해야할 점

1. Fetch type Eager 때문에 발생한다?

연관관계의 Entity를 가져오는 fetch 전략 중 lazy와 eager가 있지만, eager 조건 때문에 N+1 문제가 발생하는 것이라고 착각하기 쉽다.

실제로 연관관계에 있는 Entity를 사용하기 이전에는 실제로 N개에 대한 쿼리가 발생하지 않아 문제가 해결된 것처럼 보이기도 한다.

하지만 실제 Entity를 사용하는 시점에 동일하게 추가 쿼리가 발생하게 되고, 발생 시간이 사용시간으로 변경되었을 뿐 근본적인 문제 해결이 전혀되지 않은 것이다!

2. Pagination + Fetch Join

Fetch join을 통해 N+1쿼리를 해결하는 방식이 좋다고 했지만, pagination을 같이 적용하는 경우 문제가 생긴다.

Pagination을 사용하는 이유가 데이터를 나눠서 가져오기 위함인데 fetch join과 함께 적용하는 경우 실제로 pagination이 적용되지 않고 모든 레코드를 가져오는 쿼리가 실행된다.

실제 모든 데이터를 조회해 memory에 올려두고, memory 내에서 페이징 처리를 진행하게 되는 것이다.

따라서 이 경우 fetch join을 사용하는 것 보다 batch_size를 사용해서 쿼리의 개수를 줄이는 방식을 택하는 것이 더 나을 수 있다.

참고 자료

[2019] Spring JPA의 사실과 오해
[10분 테코톡] 수달의 JPA N+1 문제

RID

이전 포스트