Bigdata, Benchmark

Jeonghak Cho·2025년 3월 22일

benchmark

Bigdata

목록 보기

7/30

벤치 마크 소개

벤치마크(Benchmarking)는 특정 시스템, 애플리케이션, 네트워크, 하드웨어의 성능을 평가하고 최적화하기 위해 수행하는 과정이다. 이를 통해 현재 성능을 객관적으로 측정하고, 개선 방향을 도출할 수 있다.

목적

성능 측정 (Performance Measurement)

하드웨어나 소프트웨어가 기대하는 성능을 발휘하는지 확인
변경 전후의 성능을 비교하여 최적화 효과 검증

비교 분석 (Comparative Analysis)

여러 환경에서 동일한 워크로드 실행 후 성능 비교
특정 기술, 설정, 하드웨어 옵션 간 우열 판단

병목 구간 식별 (Bottleneck Identification)

시스템에서 가장 느린 부분을 찾아내어 최적화 기회 제공
CPU, 메모리, 디스크, 네트워크 등 개별 성능 요소 분석

스케일링 및 확장성 검증 (Scalability Testing)

부하가 증가했을 때 시스템이 정상적으로 작동하는지 확인
클러스터, 컨테이너, 클라우드 환경에서 확장성 평가

최적화 및 비용 절감 (Optimization & Cost Reduction)

성능 저하 없이 하드웨어 및 소프트웨어 리소스를 효율적으로 활용
불필요한 인프라 비용 절감 및 효율적인 투자 계획 수립

신뢰성 및 안정성 테스트 (Reliability & Stability Testing)

장기간 실행 시 성능 저하 또는 장애 발생 여부 확인
극한 부하(Stress Test)에서도 정상 작동하는지 평가

테스트 환경

CPU: AMD Ryzen 9 5950X
메모리: 32GB DDR4
디스크: NVMe SSD
OS: Ubuntu 22.04

커맨트 실행 속도

Hyperfine

Hyperfine은 커맨드 실행 속도를 측정하는 Rust 기반 벤치마킹 도구이다. Shell 스크립트, CLI 명령어, 프로그램 실행 시간을 비교할 때 사용한다.

Hyperfine 사용법

기능	사용 예시	설명
기본 벤치마크	`hyperfine "sleep 1"`
명령어 비교	`hyperfine "ls -l" "ls -lh"`
반복 실행	`hyperfine -r 10 "echo hello"`	10번 반복 실행 후 평균값 출력
결과 저장 (CSV)	`hyperfine "ls" --export-csv result.csv`	벤치마크 결과를 CSV 파일로 저장하여 분석 가능
환경 변수 사용	`hyperfine -e MY_ENV=prod "echo $MY_ENV"`	실행 환경을 변경하여 벤치마크 테스트 가능
스파크 실행 시간 비교	hyperfine "spark-submit job1.py" "spark-submit --master local[4] job1.py"	Spark 실행 옵션을 다르게 설정하고 어떤 실행 방식이 더 빠른지 비교할 수 있음

`hyperfine` 결과

항목	의미
Time (mean ± σ)	평균 실행 시간 ± 표준 편차
User	사용자 모드에서 실행된 CPU 시간
System	커널 모드에서 실행된 CPU 시간
Range (min … max)	최소~최대 실행 시간 범위
10 runs	실행 횟수 (기본값: 자동 조정)

time

time 명령어는 프로그램 또는 명령어 실행 시간을 측정하는 데 사용된다.실행하면 실제 경과 시간(Real), CPU 사용 시간(User, Sys) 을 보여준다.

testdata.py

with open("bigfile.txt", "w") as f:
    for i in range(100000000):
        f.write(f"Line {i}: This is sample text data for testing purposes.\n")

vagrant@slave2:~$ time python3 test.py

real    2m4.177s
user    0m15.554s
sys     0m11.903s

`time` 명령어 결과

항목	의미
real	실제 경과 시간 (Wall-clock time)
user	사용자 모드에서 실행된 CPU 시간
sys	커널 모드에서 실행된 CPU 시간

real 값이 가장 크며, user + sys 합이 real과 가까울수록 CPU 사용률이 높다는 의미다.

CPU / Memory 성능

sysbench

CPU 성능

vagrant@slave2:~$ sysbench cpu --cpu-max-prime=20000 run
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1854.13

General statistics:
    total time:                          10.0012s
    total number of events:              18545

Latency (ms):
         min:                                    0.49
         avg:                                    0.54
         max:                                  335.24
         95th percentile:                        0.61
         sum:                                 9989.75

Threads fairness:
    events (avg/stddev):           18545.0000/0.00
    execution time (avg/stddev):   9.9898/0.00

메모리 성능 확인 ( 읽기 속도 )

vagrant@slave2:~$ sysbench memory --memory-block-size=1M --memory-total-size=1G --memory-oper=read run
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1024KiB
  total size: 10240MiB
  operation: read
  scope: global

Initializing worker threads...

Threads started!

Total operations: 10240 (54274.69 per second)

10240.00 MiB transferred (54274.69 MiB/sec)


General statistics:
    total time:                          0.1882s
    total number of events:              10240

Latency (ms):
         min:                                    0.01
         avg:                                    0.02
         max:                                    0.31
         95th percentile:                        0.03
         sum:                                  184.98

Threads fairness:
    events (avg/stddev):           10240.0000/0.00
    execution time (avg/stddev):   0.1850/0.00

메모리 쓰기 속도

vagrant@slave2:~$ sysbench memory --memory-block-size=1M --memory-total-size=1G --memory-oper=write run
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1024KiB
  total size: 10240MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 10240 (31350.96 per second)

10240.00 MiB transferred (31350.96 MiB/sec)


General statistics:
    total time:                          0.3256s
    total number of events:              10240

Latency (ms):
         min:                                    0.03
         avg:                                    0.03
         max:                                    0.22
         95th percentile:                        0.04
         sum:                                  313.09

Threads fairness:
    events (avg/stddev):           10240.0000/0.00
    execution time (avg/stddev):   0.3131/0.00

파일 준비

vagrant@slave2:~$ sysbench fileio --file-total-size=1G prepare
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

128 files, 8192Kb each, 1024Mb total
Creating files for the test...
Extra file open flags: (none)
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
...
Creating file test_file.127
1073741824 bytes written in 2.23 seconds (459.45 MiB/sec).

랜덤 읽기 / 쓰기

vagrant@slave2:~$ sysbench fileio --file-total-size=1G --file-test-mode=rndrw --time=60 --threads=4 run
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 8MiB each
1GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      1022.59
    writes/s:                     681.73
    fsyncs/s:                     2189.21

Throughput:
    read, MiB/s:                  15.98
    written, MiB/s:               10.65

General statistics:
    total time:                          60.1410s
    total number of events:              233650

Latency (ms):
         min:                                    0.00
         avg:                                    1.03
         max:                                   19.08
         95th percentile:                        4.10
         sum:                               239821.17

Threads fairness:
    events (avg/stddev):           58412.5000/1036.58
    execution time (avg/stddev):   59.9553/0.00

테스트 파일 정리

벤치마킹 후 생성된 파일 제거

sysbench fileio cleanup

Sysbench 테스트 명령어

테스트 항목	명령어
CPU 테스트	`sysbench cpu --cpu-max-prime=20000 run`
메모리 읽기	`sysbench memory --memory-oper=read run`
메모리 쓰기	`sysbench memory --memory-oper=write run`
디스크 I/O 테스트	`sysbench fileio --file-total-size=10G --file-test-mode=rndrw run`
MySQL 성능 테스트	`sysbench oltp_read_write --mysql-user=root --mysql-password=비밀번호 --mysql-db=test --tables=10 --table-size=100000 --threads=10 run`

네트워크 성능

iperf3

iperf3는 네트워크 대역폭과 지연 시간을 측정하는 CLI 도구이다. TCP/UDP 성능을 측정한다.

설치

sudo apt install iperf3 -y

결과 ( 서버 )

vagrant@slave2:~$ iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.56.1, port 60236
[  5] local 192.168.56.102 port 5201 connected to 192.168.56.1 port 60250
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   143 MBytes  1.20 Gbits/sec
[  5]   1.00-2.00   sec   152 MBytes  1.28 Gbits/sec
[  5]   2.00-3.00   sec   122 MBytes  1.03 Gbits/sec
[  5]   3.00-4.00   sec   169 MBytes  1.42 Gbits/sec
[  5]   4.00-5.00   sec   143 MBytes  1.20 Gbits/sec
[  5]   5.00-6.00   sec   135 MBytes  1.14 Gbits/sec
[  5]   6.00-7.00   sec   144 MBytes  1.21 Gbits/sec
[  5]   7.00-8.00   sec   138 MBytes  1.16 Gbits/sec
[  5]   8.00-9.00   sec   136 MBytes  1.14 Gbits/sec
[  5]   9.00-10.00  sec   158 MBytes  1.33 Gbits/sec
[  5]  10.00-10.54  sec   104 MBytes  1.62 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.54  sec  1.51 GBytes  1.23 Gbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

결과 ( 클라이언트 )

root@DESKTOP-SCOK45O:~# iperf3 -c 192.168.56.102
Connecting to host 192.168.56.102, port 5201
[  5] local 172.31.153.48 port 44948 connected to 192.168.56.102 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   162 MBytes  1.36 Gbits/sec  149    308 KBytes
[  5]   1.00-2.00   sec   156 MBytes  1.31 Gbits/sec  321    256 KBytes
[  5]   2.00-3.00   sec   126 MBytes  1.06 Gbits/sec  151    317 KBytes
[  5]   3.00-4.00   sec   169 MBytes  1.42 Gbits/sec  299    301 KBytes
[  5]   4.00-5.00   sec   164 MBytes  1.38 Gbits/sec  150    229 KBytes
[  5]   5.00-6.00   sec   123 MBytes  1.03 Gbits/sec  105    291 KBytes
[  5]   6.00-7.00   sec   164 MBytes  1.38 Gbits/sec  138    303 KBytes
[  5]   7.00-8.00   sec   164 MBytes  1.37 Gbits/sec  196    393 KBytes
[  5]   8.00-9.00   sec   134 MBytes  1.12 Gbits/sec  106    291 KBytes
[  5]   9.00-10.00  sec   184 MBytes  1.54 Gbits/sec  470    233 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.51 GBytes  1.30 Gbits/sec  2085             sender
[  5]   0.00-10.54  sec  1.51 GBytes  1.23 Gbits/sec                  receiver

UDP 테스트

iperf3 -u -c <서버 IP> -b 100M

Netperf

설치

sudo apt install netperf -y

서버

vagrant@slave2:~$ netserver

클라이언트

root@DESKTOP-SCOK45O:~# netperf -H 192.168.56.102
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.56.102 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072  16384  16384    10.02    1356.32

저장소

fio

fio는 디스크 I/O 성능을 벤치마킹하는 강력한 도구이다. 랜덤 읽기/쓰기 속도, 순차 읽기/쓰기 속도 등을 측정할 수 있다.

설치

sudo apt install fio -y

랜덤 읽기

vagrant@slave2:~$ fio --name=random_read --ioengine=libaio --rw=randread --bs=4k --size=1G --numjobs=4 --runtime=60 --group_reporting
random_read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.16
Starting 4 processes
random_read: Laying out IO file (1 file / 1024MiB)
random_read: Laying out IO file (1 file / 1024MiB)
random_read: Laying out IO file (1 file / 1024MiB)
random_read: Laying out IO file (1 file / 1024MiB)
Jobs: 4 (f=4): [r(4)][100.0%][r=15.4MiB/s][r=3945 IOPS][eta 00m:00s]
random_read: (groupid=0, jobs=4): err= 0: pid=73094: Sat Mar 22 10:06:39 2025
...
Run status group 0 (all jobs):
   READ: bw=15.0MiB/s (15.7MB/s), 15.0MiB/s-15.0MiB/s (15.7MB/s-15.7MB/s), io=901MiB (945MB), run=60001-60001msec

Disk stats (read/write):
    dm-0: ios=230603/90, merge=0/0, ticks=153896/12, in_queue=153908, util=99.94%, aggrios=230603/38, aggrmerge=0/61, aggrticks=210473/60, aggrin_queue=44, aggrutil=99.86%
  sda: ios=230603/38, merge=0/61, ticks=210473/60, in_queue=44, util=99.86%

랜덤 쓰기

fio --name=random_write --ioengine=libaio --rw=randwrite --bs=4k --size=1G --numjobs=4 --runtime=60 --group_reporting

IOPing

ioping은 디스크 I/O의 지연 시간을 측정하는 도구이다. 네트워크의 ping과 비슷하지만 디스크에 대한 응답 시간을 측정한다.

S3

S3는 블록 디바이스가 아니라 네트워크 스토리지라서 fio와 ioping이 직접 적용되지 않는다. 대신 s5cmd, aws s3 cp, s3-benchmark 같은 S3 최적화 도구를 사용할 수 있다.
| 벤치마킹 대상 | 방법 | 사용 가능 도구 |
|------------------|---------------------------|----------------------------|
| 로컬 디스크 성능 | 랜덤/순차 읽기·쓰기 | fio, ioping |
| S3 업로드 성능 | 대용량 파일 업로드 속도 | s5cmd, aws s3 cp |
| S3 다운로드 성능 | 대용량 파일 다운로드 속도 | s5cmd, aws s3 cp |
| S3 전체 성능 | 병렬 처리 및 IOPS 측정 | s3-benchmark |

쿠버네티스 기반 SPARK

HiBench (종합적인 Spark 벤치마크)

장점: 다양한 Spark 워크로드(TPC-H, Graph, Machine Learning) 지원
단점: 기본적으로 YARN 기반이라 Kubernetes에서 사용하려면 약간의 설정 변경 필요

추천 대상: Spark의 전반적인 성능 측정(데이터 처리 속도, CPU/메모리 활용률 등)

SparkPerf (Databricks Spark Performance Tests)

장점: Databricks에서 만든 공식 Spark 성능 테스트 스위트
단점: 공개적으로 문서화가 잘 안 되어 있음

추천 대상: Spark의 내부적인 성능(Shuffling, Join, Caching 등) 테스트

TPC-DS / TPC-H 벤치마크 (쿼리 성능 측정)

장점: Spark SQL 성능을 표준화된 방식으로 테스트 가능
단점: 설정이 다소 복잡하고, 데이터 로딩 시간이 필요

추천 대상: Spark SQL 성능 벤치마킹(OLAP, 대규모 데이터 쿼리 성능)

Jeonghak Cho

khagor

이전 포스트

Bigdata, Scala / GraalVM

다음 포스트