Spring으로 비동기 프로그래밍을 돌리면서 이상한 현상이 발생했다. Thread가 10개만 동작하고 스프링 동작이 멈춰지는 것이었다. 심지어 스프링 기동 시에도 돌려야하는 작업이라 스프링 기동까지 해당 작업들을 기다리는 현상까지 발생했다. 뭐야...?
상황 재현을 위해 간단하게 밑의 코드로 Executor 빈을 만들었다.
@Bean
public Executor AsyncExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(15);
threadPoolTaskExecutor.setMaxPoolSize(1000);
threadPoolTaskExecutor.setQueueCapacity(10);
threadPoolTaskExecutor.initialize();
return threadPoolTaskExecutor;
}
그리고 밑에서 해당 상황을 재현해본다.
@Slf4j
@Transactional
@RequiredArgsConstructor
@Service
public class FirstService {
private final SecondService secondService;
private final ThirdService thirdService;
public void doFirst() {
for (int i = 0; i < 20; i++) {
secondService.doSecond();
}
log.info("----------- DONE ------------");
}
}
@Slf4j
@Transactional
@Async
@Service
public class SecondService {
public void doSecond() {
try {
Thread.sleep(5000);
} catch (Exception e) {
log.error(e.getMessage());
}
log.info("[Second Service]: " + Thread.currentThread().getName());
}
}
그러면 예상할 수 있는 것은 15개의 Thread가 먼저 움직이고 5초 뒤 5개의 Thread가 남은 작업을 끝내는 것이다. 실행해보자.
2022-08-16 20:55:13.437 INFO 22796 --- [nio-8080-exec-1] c.i.b.api.member.service.FirstService : ----------- DONE ------------
2022-08-16 20:55:18.479 INFO 22796 --- [AsyncExecutor-7] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-7
2022-08-16 20:55:18.480 INFO 22796 --- [syncExecutor-11] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-11
2022-08-16 20:55:18.480 INFO 22796 --- [AsyncExecutor-5] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-5
2022-08-16 20:55:18.480 INFO 22796 --- [AsyncExecutor-9] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-9
2022-08-16 20:55:18.480 INFO 22796 --- [AsyncExecutor-4] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-4
2022-08-16 20:55:18.480 INFO 22796 --- [AsyncExecutor-8] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-8
2022-08-16 20:55:18.480 INFO 22796 --- [AsyncExecutor-6] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-6
2022-08-16 20:55:18.480 INFO 22796 --- [AsyncExecutor-1] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-1
2022-08-16 20:55:18.480 INFO 22796 --- [AsyncExecutor-2] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-2
2022-08-16 20:55:18.480 INFO 22796 --- [syncExecutor-12] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-12
2022-08-16 20:55:23.483 INFO 22796 --- [syncExecutor-15] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-15
2022-08-16 20:55:23.483 INFO 22796 --- [AsyncExecutor-3] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-3
2022-08-16 20:55:23.483 INFO 22796 --- [AsyncExecutor-7] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-7
2022-08-16 20:55:23.483 INFO 22796 --- [AsyncExecutor-8] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-8
2022-08-16 20:55:23.484 INFO 22796 --- [syncExecutor-10] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-10
2022-08-16 20:55:23.485 INFO 22796 --- [syncExecutor-14] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-14
2022-08-16 20:55:23.485 INFO 22796 --- [AsyncExecutor-4] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-4
2022-08-16 20:55:23.486 INFO 22796 --- [syncExecutor-11] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-11
2022-08-16 20:55:23.485 INFO 22796 --- [AsyncExecutor-9] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-9
2022-08-16 20:55:23.484 INFO 22796 --- [syncExecutor-13] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-13
위의 로그를 보면 뭔가 잘못되었다는걸 느낄 수 있다. 쓰레드가 15개인데, 작업들이 5초 단위로 10개씩 로그가 찍힌다.
그리고 찾은 것은 예상치도 못한 곳이었다. Transaction과 DBCP!! 얘들 때문에 쓰레드가 10개만 도는 것이다. 진짜 원인 찾는데만 10시간 넘게 걸린 것 같다. 😂 로그를 볼 때 DB 커넥션이 끊어지던데, 그 끊어지는게 spring 자체가 기동이 중지되어서 인 줄 알았는데, DB커넥션이 진짜 Timeout 되어서 끊어진 것일 줄이야. 진짜 꿈에도 몰랐다.
찾게 된 방법은 Transactional annotation을 삭제한 것이었다. 삭제하니까 모든 스레드가 도는 것이다. 진짜 찾고나서 소리지를뻔 했다.
그럼 Transaction과 DBCP가 무슨 죄길래, ThreadPool이 지맘대로 움직이게 되는 것일까?
즉, DB Connection을 점유하기 때문에, DB Connection을 가지는 작업들이 싹다 정지되어 버리는 것이다. 특히 Transaction이 걸린 작업들은 매우 타격이 크다.
해결 방안은 크게 두 가지이다.
원인 자체를 없애는 방법이라 좋긴 하지만, DB의 성능도 고려해야 하고, 해당 DB의 Connection이 1000개로 제한되어 있고, 여러 서비스가 같이 사용하는 DB라면 Connection의 수도 마냥 늘려버리는 것도 한계가 있으므로 조심해야 한다. 10개의 서비스가 100씩만 가지기로 했는데, 우리만 120개 가져가면 안되니까 말이다.
spring:
datasource:
driver-class-name: "org.postgresql.Driver"
url: "jdbc:postgresql://***.***.***.***:15432/{DB_NAME}"
username: {name}
password: {pw}
hikari:
maximum-pool-size: 50
2022-08-16 21:18:20.033 INFO 30824 --- [nio-8080-exec-1] c.i.b.api.member.service.FirstService : ----------- DONE ------------
2022-08-16 21:18:25.048 INFO 30824 --- [syncExecutor-10] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-10
2022-08-16 21:18:25.048 INFO 30824 --- [AsyncExecutor-7] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-7
2022-08-16 21:18:25.049 INFO 30824 --- [AsyncExecutor-9] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-9
2022-08-16 21:18:25.048 INFO 30824 --- [syncExecutor-12] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-12
2022-08-16 21:18:25.049 INFO 30824 --- [AsyncExecutor-2] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-2
2022-08-16 21:18:25.048 INFO 30824 --- [syncExecutor-13] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-13
2022-08-16 21:18:25.048 INFO 30824 --- [syncExecutor-14] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-14
2022-08-16 21:18:25.048 INFO 30824 --- [AsyncExecutor-8] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-8
2022-08-16 21:18:25.048 INFO 30824 --- [AsyncExecutor-3] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-3
2022-08-16 21:18:25.048 INFO 30824 --- [syncExecutor-11] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-11
2022-08-16 21:18:25.049 INFO 30824 --- [AsyncExecutor-4] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-4
2022-08-16 21:18:25.049 INFO 30824 --- [AsyncExecutor-5] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-5
2022-08-16 21:18:25.049 INFO 30824 --- [syncExecutor-15] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-15
2022-08-16 21:18:25.049 INFO 30824 --- [AsyncExecutor-1] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-1
2022-08-16 21:18:25.049 INFO 30824 --- [AsyncExecutor-6] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-6
2022-08-16 21:18:30.050 INFO 30824 --- [AsyncExecutor-7] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-7
2022-08-16 21:18:30.050 INFO 30824 --- [AsyncExecutor-9] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-9
2022-08-16 21:18:30.050 INFO 30824 --- [syncExecutor-12] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-12
2022-08-16 21:18:30.050 INFO 30824 --- [syncExecutor-10] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-10
2022-08-16 21:18:30.050 INFO 30824 --- [AsyncExecutor-2] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-2
로그를 보면 예상대로 15개 실행되고 5개가 남아서 실행된 것을 확인 가능하다.
ThreadPool을 쓰는 작업이 Transaction이 없고, dbConnection을 가지지 않는다면, 많이 두어도 상관없다. 근데 10미만보다는 4~6을 권장한다. 왜냐하면 Spring Transaction이 걸릴 때, Connection 2번 연결해버리는 경우도 많기 때문이고, 동시요청이 얼마나 많을지 모르기 때문이다. 밑의 예제에서는 5개로 설정하였다.
@Bean
public Executor AsyncExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(5);
threadPoolTaskExecutor.setMaxPoolSize(5);
threadPoolTaskExecutor.setQueueCapacity(Integer.MAX_VALUE);
threadPoolTaskExecutor.initialize();
return threadPoolTaskExecutor;
}
maxPoolSize랑 queueCapacity는 기본 값이 둘 다 Integer.MAX_VALUE라서 core만 선택해줘도 된다.
2022-08-16 21:20:57.328 INFO 36484 --- [nio-8080-exec-1] c.i.b.api.member.service.FirstService : ----------- DONE ------------
2022-08-16 21:21:02.345 INFO 36484 --- [AsyncExecutor-5] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-5
2022-08-16 21:21:02.345 INFO 36484 --- [AsyncExecutor-3] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-3
2022-08-16 21:21:02.345 INFO 36484 --- [AsyncExecutor-1] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-1
2022-08-16 21:21:02.345 INFO 36484 --- [AsyncExecutor-4] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-4
2022-08-16 21:21:02.345 INFO 36484 --- [AsyncExecutor-2] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-2
2022-08-16 21:21:07.346 INFO 36484 --- [AsyncExecutor-5] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-5
2022-08-16 21:21:07.346 INFO 36484 --- [AsyncExecutor-1] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-1
2022-08-16 21:21:07.346 INFO 36484 --- [AsyncExecutor-4] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-4
2022-08-16 21:21:07.346 INFO 36484 --- [AsyncExecutor-2] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-2
2022-08-16 21:21:07.346 INFO 36484 --- [AsyncExecutor-3] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-3
2022-08-16 21:21:12.347 INFO 36484 --- [AsyncExecutor-3] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-3
2022-08-16 21:21:12.347 INFO 36484 --- [AsyncExecutor-2] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-2
2022-08-16 21:21:12.347 INFO 36484 --- [AsyncExecutor-1] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-1
2022-08-16 21:21:12.347 INFO 36484 --- [AsyncExecutor-4] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-4
2022-08-16 21:21:12.349 INFO 36484 --- [AsyncExecutor-5] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-5
2022-08-16 21:21:17.350 INFO 36484 --- [AsyncExecutor-3] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-3
2022-08-16 21:21:17.350 INFO 36484 --- [AsyncExecutor-2] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-2
2022-08-16 21:21:17.352 INFO 36484 --- [AsyncExecutor-4] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-4
2022-08-16 21:21:17.353 INFO 36484 --- [AsyncExecutor-1] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-1
2022-08-16 21:21:17.355 INFO 36484 --- [AsyncExecutor-5] c.i.b.api.member.service.SecondService : [Second Service]: AsyncExecutor-5
로그를 보면 5개씩 실행된 것을 확인할 수 있다.
이거 처음 겪었을 때, 다른 업무도 해야 하기에 임시방편만 조치해두고 계속 찾아보고 있었다. 임시방편으로도 기능 상은 문제없지만, 원인을 못 찾는 버그는 진짜 맘에 계속 남아서 주변 친구들이랑 형, 팀원 모두에게 물어보면서 원인을 찾았는데, 못 찾았었다. 그리고 이걸 찾게 되어서 물어본 사람들에게 다시 알려주고 있다 ㅎㅎㅎ 공유의 즐거움인건가, 뭔가 기분이 좋다. 😁