Perf tool

EEEFFEE·2024년 1월 27일

Perf

목록 보기

1/4

24.01.27 최초 작성

1. Perf tool

리눅스 운영체제 공식 성능 분석 도구로 리눅스 소스코드에 포함 됨 (tools/perf)
Performance counter를 유저 공간에서 볼 수 있는 인터페이스 제공
- 전반적인 성능 측정
- HW 이벤트 모니터링
- SW 이벤트 트레이싱
지원하는 카운터/트레이싱 포인트가 아키텍처/커널마다 달라 확인 필요
각 이벤트는 독립적
기본 명령어 형식

perf <서브커맨드> <이벤트 지정>

perf 

		stat
			-e <이벤트 이름 1>,<이벤트 이름 2>...  <어플리케이션 이름>

1.1 Perf event

perf list를 통해 확인 가능

1.1.1 Hardware event

branch-misses : 예측을 통해 미리 로드한 명령어가 예측과 달리 사용되지 않은 횟수 나타냄
(높을수록 코드가 비효율적임)
bus-cycles : 프로세스-메모리 사이에 데이터가 오간 횟수를 나타냄
(높을수록 cache-miss가 많이 발생함을 나타냄)
cache-misses : cache-miss가 발생한 횟수 나타냄
cache-references : cache가 얼마나 참조되었는지 나타냄
cpu-cycles OR cycles : cpu가 계산한 횟수 나타냄
(cpu가 얼마나 활성화되었는지 알 수 있음, timer에 의존적)
instructions : cpu에서 instruction이 얼마나 실행되었는지 나타냄

perf stat -e bus-cycles,instructions sleep 1

1.1.2 Software event

alignment-faults : 데이터의 정렬이 안되어있을 경우 이를 처리하는 과정이 얼마나 발생했는지 나타냄
bpf-output : bpf 이벤트가 얼마나 발생했는지 나타냄
cgroup_switches : 리눅스의 특정 자원을 나눠쓰는 프로세스 그룹끼리 context switching 발생한 횟수 나타냄
context-switches OR cs : context switching 발생한 횟수 나타냄
cpu-migrations OR migrations : 특정 cpu에서 실행 중인 스레드가 다른 cpu로 이동한 횟수
emulation-faults : 하드웨어에서 지원하지 않는 명령어가 에뮬레이션된 횟수
major-faults : major fault가 발생한 횟수
minor-faults : minor fault가 발생한 횟수
page-faults OR faults : page fault가 발생한 횟수
task-clock : 프로세스가 cpu에서 실행된 시간

perf stat -e context-switches,task-clock sleep 1

1.1.3 Hardware cache event

L1-dcache-load-misses : L1 데이터 캐시에 요청한 데이터가 없었던 횟수
L1-dcache-load : L1 데이터 캐시에 데이터를 로드한 횟수
L1-dcache-store-misses : L1 데이터 캐시에 데이터 쓰기를 실패한 횟수
L1-dcache-stores : L1 데이터 캐시에 데이터를 쓴 횟수
L1-icache-load-misses : L1 instruction 캐시에 요청한 데이터가 없었던 횟수
L1-icache-loads : L1 instruction 캐시에 데이터를 요청한 횟수
branch-load-misses : 이후 실행될 명령어의 예측이 실패한 횟수
branch-loads : 이후 실행될 명령어를 로드한 횟수
dTLB-load-misses : 데이터 TLB에 요청한 데이터가 없었던 횟수
dTLB-store-misses : 데이터 TLB에 데이터 쓰기를 실패한 횟수
iTLB-load-misses : instruction TLB에 요청한 데이터가 없었던 횟수
node-loads : NUMA에서 메모리를 나눈 단위인 노드에 데이터를 요청한 횟수
node-stores : NUMA에서 노드에 데이터를 쓴 횟수

perf stat -e L1-dcache-load-misses,L1-dcache-load sleep 1

1.1.4 Arm SPE counters

Arm 아키텍처 전용 event counter

2. Perf lock

시스템의 lock이벤트를 모니터링하고 분석
커널 컴파일 시 CONFIG_LOCKDEP, LOCK_STAT을 설정해야 사용 가능

# lock 관련 이벤트 기록
perf lock record 

perf lock record dd if=/dev/zero of=/dev/null bs=1M count=1024

# record를 보고서 형태로 보여 줌
perf lock report

# record를 사용자 정의 스크립트를 통해 분석
perf lock script

2.1 Perf script

lock발생 어플리케이션

import multiprocessing
import time
import random

lock = multiprocessing.Lock()

counter = multiprocessing.Value('i', 0)

def worker():
    end_time = time.time() + 5 

    while time.time() < end_time:

        lock.acquire()
        try:

            print(counter.value)
            counter.value += 1

            time.sleep(random.uniform(0, 1))
        finally:
            # 락을 반납합니다.
            lock.release()

p1 = multiprocessing.Process(target=worker)
p2 = multiprocessing.Process(target=worker)
p1.start()
p2.start()

p1.join()
p2.join()

EEEFFEE

네

다음 포스트

Perf tool

Perf

1. Perf tool

1.1 Perf event

1.1.1 Hardware event

1.1.2 Software event

1.1.3 Hardware cache event

1.1.4 Arm SPE counters

2. Perf lock

2.1 Perf script

Perf Topdown approach

0개의 댓글