deleteAllById와 deleteAllByIdInBatch는 뭐가 다른가?

허석진·2023년 3월 14일

Batch processing deleteAllById deleteAllByIdInBatch spring-data-jpa

프로젝트를 리팩토링하는 과정에서 deleteAllById와 deleteAllByIdInBatch가 엄청난 차이가 있다는 것을 알게되었다.
사실 파고들지 않으면 이 차이는 너무나 간단하게 두 줄로 설명이 가능하다.

deleteAllById는 각 id를 select하고 이후에 각각 delete한다.
반면에 deleteAllByIdInBatch는 in을 사용한 단 한번의 delete만을 시행한다.

왜 이렇게 작동할까?

`deleteAllById`의 코드

사실 이 부분은 왜고 뭐고가 없다. 그렇게 코드가 짜여졌으니까!
아래에 deleteAllById 코드를 Spring Dat JPA의 GitHub로 부터 가져와봤다.

@Override
@Transactional
public void deleteAllById(Iterable<? extends ID> ids) {

    Assert.notNull(ids, "Ids must not be null");

    for (ID id : ids) {
        deleteById(id);
    }
}
 
@Transactional
@Override
public void deleteById(ID id) {

    Assert.notNull(id, ID_MUST_NOT_BE_NULL);

    findById(id).ifPresent(this::delete);
}

@Override
@Transactional
@SuppressWarnings("unchecked")
public void delete(T entity) {

    Assert.notNull(entity, "Entity must not be null");

    if (entityInformation.isNew(entity)) {
        return;
    }

    Class<?> type = ProxyUtils.getUserClass(entity);

    T existing = (T) em.find(type, entityInformation.getId(entity));

    // if the entity to be deleted doesn't exist, delete is a NOOP
    if (existing == null) {
        return;
    }

    em.remove(em.contains(entity) ? entity : em.merge(entity));
}

위 코드에서 확인 할 수 있다시피 deleteAllById는 Iterable<? extends ID>를 순회하며 findById(id).ifPresent(this::delete);를 순회 한다.
그리고 그 delete에서는 친절하게도 각각의 id로 EntityManager를 뒤져보고 존재하지 않는 entity라면 if (existing == null) { return; }를 통해 오류를 발생시키지 않고 알아서 걸러주기까지 한다!
(물론 Iterable<? extends ID>에 null이 포함된 경우에는 Assert를 발생시키기는 한다.)

실제 테스트 코드를 작성하고 Iterable<? extends ID>에 존재하는 id 1개와 존재하지않는 id 1개를 넣고 확인해본 결과 Query가 다음과 같이 나왔다.

Hibernate: 
    select
        a1_0.id,
        a1_0.b_id,
        a1_0.label 
    from
        a a1_0 
    where
        a1_0.id=?
Hibernate: 
    select
        a1_0.id,
        a1_0.b_id,
        a1_0.label 
    from
        a a1_0 
    where
        a1_0.id=?
2023-03-14T21:44:28.976+09:00 DEBUG 14968 --- [nio-8080-exec-1] o.s.orm.jpa.JpaTransactionManager        : Initiating transaction commit
2023-03-14T21:44:28.976+09:00 DEBUG 14968 --- [nio-8080-exec-1] o.s.orm.jpa.JpaTransactionManager        : Committing JPA transaction on EntityManager [SessionImpl(1361655427<open>)]
Hibernate: 
    delete 
    from
        a 
    where
        id=?

앞서 구현코드에서 확인했던 것처럼 select가 2번 실행 될지라도 delete는 존재하는 id에 대해서만 실행된다.

`deleteAllByIdInBatch`의 코드

마찬가지로 동일한 GitHub에서 가져온 코드이다.

public static final String DELETE_ALL_QUERY_STRING = "delete from %s x";
public static final String DELETE_ALL_QUERY_BY_ID_STRING = "delete from %s x where %s in :ids";

@Override
@Transactional
public void deleteAllByIdInBatch(Iterable<ID> ids) {

    Assert.notNull(ids, "Ids must not be null");

    if (!ids.iterator().hasNext()) {
        return;
    }

    if (entityInformation.hasCompositeId()) {

        List<T> entities = new ArrayList<>();
        // generate entity (proxies) without accessing the database.
        ids.forEach(id -> entities.add(getReferenceById(id)));
        deleteAllInBatch(entities);
    } else {

        String queryString = String.format(DELETE_ALL_QUERY_BY_ID_STRING, entityInformation.getEntityName(),
                entityInformation.getIdAttribute().getName());

        Query query = em.createQuery(queryString);
        /**
            * Some JPA providers require {@code ids} to be a {@link Collection} so we must convert if it's not already.
            */
        if (Collection.class.isInstance(ids)) {
            query.setParameter("ids", ids);
        } else {
            Collection<ID> idsCollection = StreamSupport.stream(ids.spliterator(), false)
                    .collect(Collectors.toCollection(ArrayList::new));
            query.setParameter("ids", idsCollection);
        }

        applyQueryHints(query);

        query.executeUpdate();
    }
}

@Override
@Transactional
public void deleteAllInBatch(Iterable<T> entities) {

    Assert.notNull(entities, "Entities must not be null");

    if (!entities.iterator().hasNext()) {
        return;
    }

    applyAndBind(getQueryString(DELETE_ALL_QUERY_STRING, entityInformation.getEntityName()), entities, em)
            .executeUpdate();
}

entity가 복합키를 가졌을 때는 getReferenceById를 사용해 Proxy 객체를 생성함으로써 select 문이 발생하지 않는다.
그리고 복합키인 경우, 아닌 경우 모두 넘겨진 id가 null만 아니라면 내부적으로 in 연산자를 사용해 1번의 Query로 모든 실행을 끝낸다.

위에서 deleteAllById를 테스트할 때와 똑같이 존재하는 id 1개, 존재하지않는 id 1개를 Iterable<? extends ID>에 넣어 실행해보면 아래와 같은 Query 1개만 발생한다.

Hibernate: 
    delete 
    from
        a 
    where
        id in(?,?)

in 연산자를 사용하기 때문에 있으나 없으나 정상적으로 작동한다.

왜 둘을 나눠뒀을까?

사실 여기까지 알아보고나서 가장 먼저든 의문이 왜 둘을 굳이 나눴을까 이다.
select 문을 발생시키지 않고 in 연산자로 1번에 처리하면 당연히 좋은거 아닌가?
그럼 그렇지 않은 경우에도 이점이 분명이 존재한다는 건가?

`deleteAllById`의 장단점

장점

DB의 레코드를 삭제한 후에 영속성 컨텍스트와 동기화를 함 (기존에 당연히 했던 작업)
메모리를 적게 먹음

단점

각 엔티티마다 select와 delete를 실행해 성능이 저하됨 (N+1 문제)

`deleteAllByIdInBatch`의 장단점

장점

delete Qeury 1번만 실행되기 때문에 성능이 향상됨

단점

DB의 레코드를 삭제한 후에 즉각적으로 영속성 컨텍스트와 동기화를 하지 않음
너무 많은 양의 엔티티를 한 번에 삭제할 때는 메모리를 많이 사용할 수 있음

이외에도 엔티티의 생명주기에 관련된 콜백, 이벤트 리스너를 등록할 수 있고 없고 동시성 제어가 가능하고 안하고 차이가 있다는데 정확한 레퍼런스를 찾을 수 없어서 적지는 못하겠다.
(생명주기, 버전관리(동시성 제어를 위한)는 각각의 엔티티에 대해서만 가능한데, InBatch를 사용하면 한 번에 처리하기 때문에 안된다고는 한다.)

장단점 요약과 예시

즉, 요약하자면 deleteAllByIdInBatch는 삭제한 엔티티에 대한 정보가 영속성 컨텍스트에 적용되지 않는다. 대신에 성능이 좋다! 이렇게 요약할 수 있을 것 같다.

이런 현상이 발생하는 예시를 코드는 아래와 같다.

public void testServiceFunc() {
    ArrayList<Long> ids = new ArrayList<>();
    for (long i = 3; i <= 4; i++) ids.add(i);
    
    aRepository.findById(3L);
    aRepository.deleteAllByIdInBatch(ids);
    Optional<A> a = aRepository.findById(3L);
}

id가 3인 엔티티가 DB에 존재할 때 findById(3L)를 통해 해당 엔티티를 영속성 컨텍스트에 등록해두면 deleteAllByIdInBatch를 통해 삭제한 후에도 findById(3L)이 아직 영속성 컨텍스트에 존재하는 엔티티를 가져오는 것을 확인 할 수 있었다.

결론

그럼에도 나는 내 프로젝트에서는 deleteAllByIdInBatch를 사용하려한다.
위에서 보여준 예시와 같이 현재 내 프로젝트에서는 한 번의 요청이 영속성 컨텍스트에 적용하고 삭제한 후에 다시 불러오는 경우도 없을 뿐더러, Query를 2N + 1개를 호출하냐 1 + 1개를 호출하냐는 의미있는 성능개선이라고 생각하기 때문이다.

추가로 영속성 컨텍스트에 적용이 안되는 것이 영 찝찝하다면 EntityManager를 이용해 Flush()메서드를 실행해 동기화하면 된다.

허석진

이전 포스트

장바구니 삭제 API 리팩토링 (Batch Processing)

다음 포스트

deleteAllById와 deleteAllByIdInBatch는 뭐가 다른가?

왜 이렇게 작동할까?

`deleteAllById`의 코드

`deleteAllByIdInBatch`의 코드

왜 둘을 나눠뒀을까?

`deleteAllById`의 장단점

장점

단점

`deleteAllByIdInBatch`의 장단점

장점

단점

장단점 요약과 예시

결론

장바구니 삭제 API 리팩토링 (Batch Processing)

주문 탐색(페이지) API 리팩토링

0개의 댓글

deleteAllById와 deleteAllByIdInBatch는 뭐가 다른가?

왜 이렇게 작동할까?

deleteAllById의 코드

deleteAllByIdInBatch의 코드

왜 둘을 나눠뒀을까?

deleteAllById의 장단점

장점

단점

deleteAllByIdInBatch의 장단점

장점

단점

장단점 요약과 예시

결론

장바구니 삭제 API 리팩토링 (Batch Processing)

주문 탐색(페이지) API 리팩토링

0개의 댓글

`deleteAllById`의 코드

`deleteAllByIdInBatch`의 코드

`deleteAllById`의 장단점

`deleteAllByIdInBatch`의 장단점