[Spring] N+1 문제와 해결법

Beanzinu·2022년 10월 21일

Spring

스프링부트

목록 보기

7/7

N+1 문제

연관관계에서 발생하는 문제로 연관 관계가 설정된 엔티티를 조회할 경우에 엔티티의 개수(n)만큼 연관관계 조회 쿼리가 추가적으로 나가는 상황을 말한다.

연관관계가 설정된 엔티티

흔히 객체지향 설계에서 객체 간의 연관을 표현하는 것을 연관관계라 한다.
@OnetoMany, @ManyToOne , @OneToOne 과 같은 어노테이션을 사용하는 필드들을 일컫는다.

문제 발생

1. 먼저 객체(도메인) 간의 연관관계는 다음과 같다.

게시물(Post)이 댓글들(PostComment)를 1:N 관계로 연관하여 가지고 있다.

public class Post {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "user_id")
    @NotNull
    private User user;

    @OneToMany(mappedBy = "post", cascade = CascadeType.ALL)
    private List<PostComment> postComments = new ArrayList<>();
    ...
}
public class PostComment {

    @Id
    @GeneratedValue
    private Long id;

    @ManyToOne
    @JoinColumn(name = "post_id")
    private Post post;
    ...
}

테스트를 위해 테스트데이터 미리 생성

2. 유저아이디를 통해 특정 유저가 작성한 게시물(Post)들을 모두 조회한다.

기본 JpaRepository 인터페이스 함수 활용
Post Table 입장에서 User Table을 left outer join 하여 where user.id = 2 조건문을 통해 결과를 가져온다.

// 1. userId를 통해 모든 Post 조회
List<Post> allByUserId = postRepository.findAllByUserId(2L);

// 쿼리
select
        post0_.id as id1_0_,
        post0_.content as content2_0_,
        post0_.search_counts as search_c3_0_,
        post0_.solved as solved4_0_,
        post0_.tags as tags5_0_,
        post0_.title as title6_0_,
        post0_.user_id as user_id8_0_,
        post0_.views as views7_0_ 
    from
        post post0_ 
    left outer join
        user user1_ 
            on post0_.user_id=user1_.id 
    where
        user1_.id=?

3. 찾은 게시물들에 각각 달린 댓글들의 개수를 조회해보자.

이때 개수가 아니라 댓글들을 몇개를 조회하든 상관없이 똑같이 N+1문제가 발생한다.
N+1문제는 연관된 객체들을 조회할 경우 발생하는 문제이기 때문이다.

// 2. 조회된 Post 에 달린 댓글들(PostComment) 조회
for( Post p : allByUserId ){
	p.getPostComments().size();
}

// 쿼리 : 총 11번의 select문이 발생!
Hibernate: 
    select
        postcommen0_.post_id as post_id3_1_0_,
        postcommen0_.id as id1_1_0_,
        postcommen0_.id as id1_1_1_,
        postcommen0_.post_id as post_id3_1_1_,
        postcommen0_.selected as selected2_1_1_ 
    from
        post_comment postcommen0_ 
    where
        postcommen0_.post_id=?


...


Hibernate: 
    select
        postcommen0_.post_id as post_id3_1_0_,
        postcommen0_.id as id1_1_0_,
        postcommen0_.id as id1_1_1_,
        postcommen0_.post_id as post_id3_1_1_,
        postcommen0_.selected as selected2_1_1_ 
    from
        post_comment postcommen0_ 
    where
        postcommen0_.post_id=?

문제발생 이유

쿼리문에 집중해보자.

Hibernate: 
    select
        postcommen0_.post_id as post_id3_1_0_,
        postcommen0_.id as id1_1_0_,
        postcommen0_.id as id1_1_1_,
        postcommen0_.post_id as post_id3_1_1_,
        postcommen0_.selected as selected2_1_1_ 
    from
        post_comment postcommen0_ 
    where
        postcommen0_.post_id=?

내가 조회한 Post들의 결과는 총 11개가 나왔다.
이때 각 Post들은 다른 PK들을 가지고 해당 PK를 댓글인 PostComment는 FK로 관리하고 있을 것이다.

Post 입장에서 댓글들을 모두 조회하려면?

where postcomment.post_id = ?

조회된 Post의 PK를 FK로 가지는 PostComment들을 가져오는 수 밖에 없다.
그래서 11개의 Post에 대하여 11번의 개별 쿼리가 나간 것이다.
위의 결과쿼리들을 상세히 보면 이러한 형태일 것이다.

select * from PostComment postcommen0_ where postcommen0_.id = 1
select * from PostComment postcommen0_ where postcommen0_.id = 2
...
select * from PostComment postcommen0_ where postcommen0_.id = 11

해결법

1. Fetch Join

Post Table을 조회할때 PostComment Table을 inner join하여 해당되는 결과를 함께 가져오는 것이다.

// PostRepository.java
@Query(value = "select p from Post p " + 
            "join fetch p.postComments "
            + "where p.user.id=:id")
    List<Post> findAllByUserId(@Param(value = "id") Long id);

// 쿼리
Hibernate: 
    select
        post0_.id as id1_0_0_,
        postcommen1_.id as id1_1_1_,
        post0_.content as content2_0_0_,
        post0_.search_counts as search_c3_0_0_,
        post0_.solved as solved4_0_0_,
        post0_.tags as tags5_0_0_,
        post0_.title as title6_0_0_,
        post0_.user_id as user_id8_0_0_,
        post0_.views as views7_0_0_,
        postcommen1_.post_id as post_id3_1_1_,
        postcommen1_.selected as selected2_1_1_,
        postcommen1_.post_id as post_id3_1_0__,
        postcommen1_.id as id1_1_0__ 
    from
        post post0_ 
    inner join
        post_comment postcommen1_ 
            on post0_.id=postcommen1_.post_id 
    where
        post0_.user_id=?

장점

간단한 JPQL 작성으로 1번의 쿼리로 원하는 결과를 가져올 수 있다.

단점

카테시안 곱(Cartesian Product) 만큼 중복된 결과가 생겨 데이터가 커질 수 있다.

중복된 결과의 경우 JPQL의 distinct를 통해 해결가능하다.

정해진 쿼리로 가져오기 때문에 페이징 쿼리가 불가능하다.
설정해놓은 FetchType은 작동하지 않는다.

2. EntityGraph

EntityGraph란
엔티티들은 서로 연관되어 있는 관계가 보통이며 이 관계는 그래프로 표현이 가능합니다. EntityGraph는 JPA가 어떤 엔티티를 불러올 때 이 엔티티와 관계된 엔티티를 불러올 것인지에 대한 정보를 제공합니다.

Fetch Join과 비슷하나 inner join이 아닌 left outer join으로 가져온다.
사실상 Fetch Join 과 비슷한 방식이다.

// PostReposiotry.java
@EntityGraph(attributePaths = {"postComments"})
List<Post> findAllByUserId(Long id);

// 쿼리
Hibernate: 
    select
        post0_.id as id1_0_0_,
        postcommen2_.id as id1_1_1_,
        post0_.content as content2_0_0_,
        post0_.search_counts as search_c3_0_0_,
        post0_.solved as solved4_0_0_,
        post0_.tags as tags5_0_0_,
        post0_.title as title6_0_0_,
        post0_.user_id as user_id8_0_0_,
        post0_.views as views7_0_0_,
        postcommen2_.post_id as post_id3_1_1_,
        postcommen2_.selected as selected2_1_1_,
        postcommen2_.post_id as post_id3_1_0__,
        postcommen2_.id as id1_1_0__ 
    from
        post post0_ 
    left outer join
        user user1_ 
            on post0_.user_id=user1_.id 
    left outer join
        post_comment postcommen2_ 
            on post0_.id=postcommen2_.post_id 
    where
        user1_.id=?

3. FetchMode.SUBSELECT

해당 엔티티 조회를 조회하는 쿼리는 그대로(LAZY) 발생하고 이후 컬렉션을 조회할때 서브쿼리를 통해 조회하는 방법이다.

기존 Post를 조회할때 PostComment를 LAZY 방식으로 가져오기 때문에 가져오지 않는다.
이후 Post에 연관된 PostComment를 조회 시 기존 쿼리를 서브쿼리로 활용하여 where 조건문에 작성한다.

// Post.java
@Fetch(FetchMode.SUBSELECT)
    @OneToMany(mappedBy = "post", cascade = CascadeType.ALL)
    private List<PostComment> postComments = new ArrayList<>();
    
// PostRepository.java
List<Post> findAllByUserId(Long id);

// 쿼리
Hibernate: 
    select
        post0_.id as id1_0_,
        post0_.content as content2_0_,
        post0_.search_counts as search_c3_0_,
        post0_.solved as solved4_0_,
        post0_.tags as tags5_0_,
        post0_.title as title6_0_,
        post0_.user_id as user_id8_0_,
        post0_.views as views7_0_ 
    from
        post post0_ 
    left outer join
        user user1_ 
            on post0_.user_id=user1_.id 
    where
        user1_.id=?
Hibernate: 
    select
        postcommen0_.post_id as post_id3_1_1_,
        postcommen0_.id as id1_1_1_,
        postcommen0_.id as id1_1_0_,
        postcommen0_.post_id as post_id3_1_0_,
        postcommen0_.selected as selected2_1_0_ 
    from
        post_comment postcommen0_ 
    where
        postcommen0_.post_id in (
            select
                post0_.id 
            from
                post post0_ 
            left outer join
                user user1_ 
                    on post0_.user_id=user1_.id 
            where
                user1_.id=?
        )

4. BatchSize

하이버네이트가 제공하는 org.hibernate.annotations.BatchSize를 통해 연관된 엔티티를 조회할때 원하는 BatchSize만큼 IN 절에 사용하여 조회한다.

BatchSize = 10 이라고 하자.
Post를 조회할 시 11개가 조회된다.
BatchSize가 10이므로 Post 1번~10번을 where ~ in () 조건으로 넣어 10개의 Post에 대한 댓글들을 조회한다.
11번째 Post를 조회할 때 또 다른 쿼리가 호출된다.
만약 BatchSize가 3이였다면 1~3,4~6,7~9,10~11 총 4번 호출됐을 것이다.

장점

Fetch join과 EntityGraph와 달리 Join 연산이 필요가 없다.

단점

IN clause에 들어가는 BatchSize의 적절한 크기를 찾기 힘들 수 있다.

hibernate.default_batch_fetch_size 속성을 사용하면 애플리케이션 전체에 기본으로 @BatchSize를 적용할 수 있다.
<property name="hibernate.default_batch_fetch_size" value="10" />

// Post.java
@BatchSize(size = 10)
@OneToMany(mappedBy = "post", cascade = CascadeType.ALL)
private List<PostComment> postComments = new ArrayList<>();

// PostRepository.java
List<Post> findAllByUserId(Long id);

// 쿼리
Hibernate: 
    select
        post0_.id as id1_0_,
        post0_.content as content2_0_,
        post0_.search_counts as search_c3_0_,
        post0_.solved as solved4_0_,
        post0_.tags as tags5_0_,
        post0_.title as title6_0_,
        post0_.user_id as user_id8_0_,
        post0_.views as views7_0_ 
    from
        post post0_ 
    left outer join
        user user1_ 
            on post0_.user_id=user1_.id 
    where
        user1_.id=?
Hibernate: 
    select
        postcommen0_.post_id as post_id3_1_1_,
        postcommen0_.id as id1_1_1_,
        postcommen0_.id as id1_1_0_,
        postcommen0_.post_id as post_id3_1_0_,
        postcommen0_.selected as selected2_1_0_ 
    from
        post_comment postcommen0_ 
    where
        postcommen0_.post_id in (
            ?, ?, ?, ?, ?, ?, ?, ?, ?, ?
        )
Hibernate: 
    select
        postcommen0_.post_id as post_id3_1_1_,
        postcommen0_.id as id1_1_1_,
        postcommen0_.id as id1_1_0_,
        postcommen0_.post_id as post_id3_1_0_,
        postcommen0_.selected as selected2_1_0_ 
    from
        post_comment postcommen0_ 
    where
        postcommen0_.post_id=?

5. QueryBuilder

Query를 자동으로 실행해주는 다양한 플러그인이 있다.
대표적으로 Mybatis, QueryDSL, JOOQ, JDBC Template 등이 있을 것이다. 이를 사용하면 로직에 최적화된 쿼리를 구현할 수 있다.

해당 플러그인이 어떻게 최적화된 쿼리를 구현하고 작동하는 지 이해하고 이보다 최적화할 수 없다면 사용해도 좋을 것 같다.

// 예시
// QueryDSL로 구현한 예제
return from(owner).leftJoin(owner.cats, cat)
                   .fetchJoin()

정리

이번 기회에 N+1 문제가 왜 생기는지에 대하여 알게 된 것같다.
다양한 해결방법이 존재하는만큼 프로젝트에 맞는 적절한 방법을 선택하는 것이 좋을 것 같고 상황에 따라 최적의 방법이 다를 것이다.
나의 경우 게시물들을 조회할때 페이지네이션을 적용시켜 10개정도를 함께 가져올 예정이기 때문에 BatchSize를 10으로 설정하면 Join 연산 없이 두번의 쿼리로 게시물과 댓글들을 함께 조회할 수 있을 것이다.

이후 과제

댓글들이 많아지거나 댓글들의 대댓글까지 관리하는 경우 게시물들을 10개씩 가져오더라도 부하가 심할 수 있다.
이 경우 해당 페이지의 게시물들만 따로 가져온 뒤 각 게시물들의 댓글들에 대해서도 페이지네이션을 적용하고 해당 페이지의 댓글들을 조회할때 대댓글을 함께 가져오는 전략을 취하면 될 것 같다.
(사실상 위에서 게시물들과 댓글들을 BatchSize를 통해 가져오는 전략과 비슷하다.)

Beanzinu

당신을 한 줄로 소개해보세요.

이전 포스트