[Java] JMH(Java Microbenchmark Harness) 로 성능 벤치마킹

아두치·2022년 3월 31일

🚀 JMH(Java Microbenchmark Harness)

JMH 는 JVM 위에서 동작하는 코드의 성능을 측정해주는 라이브러리이다.
사실 정확한 성능을 측정하기 위해선 사용하는 가상머신의 제품에 따라 Hot-Spot VM 오버헤드나 GC 오버헤드와 같은 코드가 동작함에 있어서 시스템의 오버헤드까지 고려해서 측정해야 하지만 간단한 코드이거나 여러 코드의 상대적 성능을 측정할 때에는 간단히 사용할 수 있는 JMH 를 사용할 수 있다.

우선 성능 측정 도구인 JMH 라이브러리를 내려받아보자.
나는 메이븐을 사용중이기 때문에 다음의 디펜던시를 추가하자.

<properties>
    <jmh.version>1.21</jmh.version>
</properties>

<dependencies>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-core</artifactId>
    <version>${jmh.version}</version>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-generator-annprocess</artifactId>
    <version>${jmh.version}</version>
</dependency>
</dependencies>

<build>
<finalName>java-jmh</finalName>
<plugins>
    <plugin>    
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
            <source>1.8</source>
            <target>1.8</target>
            <annotationProcessorPaths>
                <path>
                    <groupId>org.openjdk.jmh</groupId>
                    <artifactId>jmh-generator-annprocess</artifactId>
                    <version>${jmh.version}</version>
                </path>
            </annotationProcessorPaths>
        </configuration>
    </plugin>
    <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <executions>
            <execution>
                <id>run-benchmarks</id>
                <phase>integration-test</phase>
                <goals>
                    <goal>exec</goal>
                </goals>
                <configuration>
                    <classpathScope>test</classpathScope>
                    <executable>java</executable>
                    <arguments>
                        <argument>-classpath</argument>
                        <classpath />
                        <argument>org.openjdk.jmh.Main</argument>
                        <argument>.*</argument>
                    </arguments>
                </configuration>
            </execution>
        </executions>
    </plugin>
</plugins>
</build>

그리고 성능을 벤치마킹할 클래스를 만들고 측정 대상 코드를 작성하자.

public class ReducingBenchmarkTest {
	private long N = 100000000L;
	public long sumReducing() {
		int result = 0;
		
		for(long i=1L; i<=N; i++) {
			result += i;
		}
		
		return result;
	}
}

이제 이 클래스를 벤치마킹 클래스로 만들 차례이다.
다음 어노테이션을 붙여서 벤치마킹용 클래스로 만들어보자.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 2, jvmArgs= {"-Xms4G", "-Xmx4G"})
public class ReducingBenchmarkTest {
	private long N = 100000000L;
	public long sumReducing() {
		int result = 0;
		
		for(long i=1L; i<=N; i++) {
			result += i;
		}
		
		return result;
	}
}

@BenchmarkMode(Mode.AverageTime) : 어떤 데이터를 벤치마킹할지에 대한 기준이다. Mode.AverageTime 은 실행 평균시간을 구한다.

@OutputTimeUnit(TimeUnit.MILLISECONDS) : 위 모드로 구한 벤치마크 데이터를 어떤 단위로 출력할 것인가에 대한 설정이다. TimeUnit.MILLISECONDS 는 밀리세컨드 단위로 출력한다.

@Fork(value = 2, jvmArgs= {"-Xms4G", "-Xmx4G"}) : 측정을 한번만 하는것은 벤치마크의 신뢰성에 문제가 있을 수 있다. 특정 시점에 시스템이 다른 이유로 갑자기 느려지거나 하는 상황이 있을 수 있기 때문에 최대한 외부 변수의 영향을 배제하기 위해 측정을 2회 실시한다. 그리고 힙 영역의 공간 부족으로 인한 gc 오버헤드를 최소화 하기 위해 힙 영역의 크기를 4GB 로 설정한다.

지금까지의 어노테이션 설정은 JMH 가 벤치마크를 할 때 필요한 기본 설정값들이다.
이제 남은 일은 실제 성능 측정 코드인 sumReducing 메소드를 벤치마킹 대상으로 설정하는 것이다.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 2, jvmArgs= {"-Xms4G", "-Xmx4G"})
public class ReducingBenchmarkTest {
	private long N = 100000000L;
    
    @Benchmark
	public long sumReducing() {
		int result = 0;
		
		for(long i=1L; i<=N; i++) {
			result += i;
		}
		
		return result;
	}
}

이렇게 해서 JMH 가 sumReducing 메소드를 실행하고 성능을 측정하여 결과를 보여줄 것이다.
참고로, JMH 는 기본적으로 사전 준비 과정으로 (몸풀기 정도로 생각하면 된다.) 메소드를 5회 실행한 뒤 본 측정으로 5회를 실행한다. (이 횟수는

다음은 위 예제 코드를 실행한 결과이다.

# JMH version: 1.21
# VM version: JDK 17.0.1, OpenJDK 64-Bit Server VM, 17.0.1+12
# VM invoker: C:\sts\contents\sts-4.13.1.RELEASE\plugins\org.eclipse.justj.openjdk.hotspot.jre.full.win32.x86_64_17.0.1.v20211116-1657\jre\bin\java.exe
# VM options: -Xms4G -Xmx4G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: jmh.ReducingBenchmarkTest.sumReducing
# Parameters: (N = 100000000)

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 48.062 ms/op
# Warmup Iteration   2: 47.331 ms/op
# Warmup Iteration   3: 56.012 ms/op
# Warmup Iteration   4: 56.871 ms/op
# Warmup Iteration   5: 47.879 ms/op
Iteration   1: 50.110 ms/op
Iteration   2: 51.592 ms/op
Iteration   3: 50.792 ms/op
Iteration   4: 52.487 ms/op
Iteration   5: 49.333 ms/op


Result "jmh.ReducingBenchmarkTest.sumReducing":
  50.863 ±(99.9%) 4.748 ms/op [Average]
  (min, avg, max) = (49.333, 50.863, 52.487), stdev = 1.233
  CI (99.9%): [46.114, 55.611] (assumes normal distribution)


# Run complete. Total time: 00:01:41

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                (N)  Mode  Cnt   Score   Error  Units
ReducingBenchmarkTest.sumReducing  100000000  avgt    5  50.863 ± 4.748  ms/op

위 결과를 보면 Warmup Iteration 이 나오는데 이게 몸풀기용 실행 결과이다.
그리고 그 밑에 Iteration 이 실제 벤치마킹이 동작한 결과이다.
마지막 줄이 측정 결과이다.

그럼 위 예제 코드를 성능 개선 한 코드로 바꿔서 동시에 측정해보자.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 2, jvmArgs= {"-Xms4G", "-Xmx4G"})
public class ReducingBenchmarkTest {
	private long N = 100000000L;
    
    @Benchmark
	public long sumReducing() {
		int result = 0;
		
		for(long i=1L; i<=N; i++) {
			result += i;
		}
		
		return result;
	}
    
    @Benchmark
	public long sumReducingByPararellStream() {
		return LongStream.rangeClosed(1, N).parallel().reduce(1,Long::sum);
	}
}

측정 결과는 다음과 같다.

Benchmark                                                (N)  Mode  Cnt   Score   Error  Units
ReducingBenchmarkTest.sumReducing                  100000000  avgt    5  47.670 ± 2.930  ms/op
ReducingBenchmarkTest.sumReducingByPararellStream  100000000  avgt    5  17.089 ± 8.472  ms/op

참고로, 벤치마킹 애플리케이션을 실행하는 방법은 메이븐 빌드를 통해 jar 파일로 패키징한 뒤 jar 파일을 실행하는 방법이 있고, main 메소드에서 실행하는 방법이 있다.
main 메소드에서 실행하는 방법은 다음과 같다.

public static void main(String[] args) throws RunnerException {

        Options opt = new OptionsBuilder()
                .include(ReducingBenchmarkTest.class.getSimpleName())
                .forks(2)
                .build();

        new Runner(opt).run();
    }

아두치

HAVE YOU TRIED IT?

이전 포스트

[Java] 힙 펄루션 (Heap pollution)

다음 포스트

[Java] JMH(Java Microbenchmark Harness) 로 성능 벤치마킹

🚀 JMH(Java Microbenchmark Harness)

[Java] 힙 펄루션 (Heap pollution)

[Java] 예외 처리 이야기 (Exception Handling)

0개의 댓글

관련 채용 정보