[사이드 프로젝트] react & flutter 채팅 앱 만들기 6 - 서버 성능 측정 - 성공

gimseonjin616·2022년 3월 14일

React Springboot flutter websocket 사이드 프로젝트

websocket을 활용한 web & app 채팅 프로그램 만들기

목록 보기

7/7

들어가며

지난 시간에는 Artillery로 웹소켓 부하테스트를 시도했다. Artrillery에서 웹소켓을 지원하지만 STOMP를 지원하지 않았고 연결까지만 확인할 수 있었고 메시지를 보내는 것을 확인할 수 없었다.

따라서 이번 시간에는 Locust라는 다른 웹 부하테스트 툴로 시도해볼 것이다.

Locust란?

Locust는 Python 기반의 오픈소스 부하테스트 도구다. 웹 UI를 제공해서 쉽게 모니터링이 가능하고 이벤트 기반이라 단일 컴퓨터에서 많은 수의 동시 사용자를 지원할 수 있다. 무엇보다 파이썬으로 쉽게 테스트 코드를 작성할 수 있다는 장점이 있다.

Locust 및 사용 라이브러리 설치

Locust는 pip3 패키지 관리자로 쉽게 설치할 수 있다.

pip3 install locust

그리고 stomper 와 websocket-client 라이브러리를 사용할 것이다.

stomper 라이브러리는 stomp 프로토콜에 맞는 메시지를 만들어주는 역할, websocket-client 라이브러리는 websocket 통신을 주고 받는 역할이다.

pip3 install stomper
pip3 install websocket-client

locustfile.py 작성

이제 locustfile을 작성해서 테스트를 진행해보겠다.

우선 Stomp 메시지를 만들 StompClient 클래스를 구현한다.


# Stomp 메시지를 만들어주는 Client이다.
class StompClient(object):
    def __init__(self, host, port, endpoint):
        self.ws_uri = f"ws://{host}:{port}/{endpoint}"

    def __del__(self):
        if self.ws:
            print("disconnect...")
            self.ws.close()
    
    def close(self):
        if self.ws:
            self.ws.close()

    def connect(self):
        start_time = time.time()
        try:
            self.ws = create_connection(self.ws_uri)
            self.ws.send("CONNECT\naccept-version:1.0,1.1,2.0\n\n\x00\n")

        except Exception as e:
            total_time = int((time.time() - start_time) * 1000)
            events.request_failure.fire(request_type="stomp", name="connect", response_time=total_time, exception=e , response_length=0)

        else:
            total_time = int((time.time() - start_time) * 1000)
            events.request_success.fire(request_type="stomp", name="connect", response_time=total_time, response_length=0)
        
    def send(self, body, destination):
        start_time = time.time()
        try:
            msg = stomper.Frame()
            msg.cmd = 'SEND'
            msg.headers = {'destination': destination}
            msg.body = json.dumps(body)
            self.ws.send(msg.pack())

        except Exception as e:
            total_time = int((time.time() - start_time) * 1000)
            events.request_failure.fire(request_type="stomp", name="send", response_time=total_time, exception=e , response_length=0)

        else:
            total_time = int((time.time() - start_time) * 1000)
            events.request_success.fire(request_type="stomp", name="send", response_time=total_time, response_length=0)

그리고 Locust에서 테스트 코드를 실행할 클래스를 만든다.
(이때 HttpUser를 상속받는다.)

class WebsocketUser(HttpUser):

    host = "127.0.0.1"
    port = 8080
    endpoint = "chatting"

    min_wait = 5000
    max_wait = 9000

    def on_start(self):
        self.stompClient = StompClient(self.host, self.port, self.endpoint)
        self.stompClient.connect()
        
    '''
    시나리오는 다음과 같다.
    1. 유저는 사이트 접속한다.
    2. 그리고 3초 대기 후 Hi 메시지를 보낸다.
    3. 3초 대기 후 Bye 메시지를 보내고 3초 대기 후 종료한다.
    '''
    
    @task()
    def send(self):
        self.stompClient.connect()
        time.sleep(3)
        self.stompClient.send({"content":"Hi!", "uuid":""},"/app/message")
        time.sleep(3)
        self.stompClient.send({"content":"Bye!", "uuid":""},"/app/message")
        time.sleep(3)
        self.stompClient.close()

locust 명령어를 통해 locustfile.py를 실행하면 다음과 같이 8089 포트에 테스트를 관리할 수 있는 UI가 뜬다.

$ locust

우선 로컬환경에서의 Websocket 서버를 테스트 해보겠다.

User는 초당 1명씩 증가해서 최대 10명까지 증가하도록 하겠다.

실행 결과

서버 성능을 살펴볼 때, 크게 두 가지 수치를 살펴봐야한다. Thoughput과 latency를 살펴봐야하며 위 보고서에서는 각각 RPS와 ResponseTime로 볼 수 있다.

이 서버는 websocket이기 때문에 Response time은 주로 서버 연결 및 종료 부분의 성능과 관련있다.

RPS는 Send 부분과 큰 연관이 있다.

Thoughput과 latency 부분은 따로 포스팅을 할 예정이라 여기선 다루지 않겠다.

모니터링 툴 뉴렐릭

부하테스트를 진행하기 위해선 부하테스트 툴만 필요한 것이 아니라 서버의 상태를 확인할 수 있는 모니터링 툴도 필요하다.

나는 Elastic APM을 사용해봤지만 이번에는 뉴렐릭(New Relic)을 사용해보고자 한다.

우선 서버에 뉴렐릭을 설치해야하기 때문에 Installation plan으로 들어간다.

그리고 APM - Java 순으로 선택한다.

그리고 설치 진행

설치 Command를 복사하여 EC2 서버에서 설치를 진행한다.

그리고 자바 어플리케이션을 등록하는 과정을 거친다.

그러면 최종적으로 APM이 설치되고 데이터가 잘 들어오는 것을 확인할 수 있다.

Locust 테스트 진행

이제 Locust의 주소를 바꿔서 테스트를 진행해보자

시나리오는 전과 동일하게 10명의 유저가 들어와서 인사를 하고 나가는 케이스다

다음과 같이 부하를 주면 충분히 커버할 수 있다.

그렇다면 유저의 수를 100명으로 늘리면 커버가 가능할까?

RPS가 증가하긴 했지만 충분히 커버 가능하다.
(Throughput이 순간적으로 400까지 올라갔다.)

여기서 부하를 더 줘야하겠지만 단순히 유저수를 늘리는 것으론 부족하다. 따라서 Locust에 Slave를 추가해서 동시 접속자 수를 늘려볼 것이다.

여기서 python은 multi thread가 안되기 때문에(GIL) multi process로 돌아가며 내 CPU 개수만큼 생성하는게 효율이 제일 좋다.

내 CPU 코어 개수는 2개(...ㅜㅜ 나중에 더 좋은 거 사야지 ㅜㅜ)

따라서 Slave 개수는 2개로 하고 다음과 같이 마스터 슬레이브를 만든다.

locust --master
locust --worker
locust --worker

그러면 우측 상단에 worker가 2개 만들어진 것을 볼 수 있다.

이번에는 500명 규모의 테스트를 시도해본다.

locust 보고서를 살펴보면 접속 유저 200명대부터 RPS 값이 뚝 떨어지고 Response Time이 급격히 늘어난 것을 보면 이때 서버는 죽은 것이다.

뉴렐릭 보고서를 봤을 때도 throughput 603(여기서는 rpm)을 찍고 CPU 사용량이나 메모리 사용량이 급격히 떨어지는 것을 보아 서버가 다운된 것으로 판단된다.

이를 토대로 판단했을 때, 내 Websocket EC2 서버는 Throughput 603 RPM, (58 RPS)와 서버가 터진 것을 고려했을 때, 상위 80%만 따져서 Latency 30ms의 성능을 가지고 있다.

그리고 별도의 DB나 외부 서비스가 없기 때문에 병목 현상이 나타는 곳도 없다.

결론

Websocket Server Spec

Server Framework : Spring boot

Server OS : Linux

Thoughput & Latency

Thoughput	Latency
603 RPM	30ms

gimseonjin616

to be data engineer

이전 포스트