Instruction-level 병렬화

Osori·2021년 4월 2일

프로그램 최적화

목록 보기

2/4

아래와 같은 c++ 코드가 있다고 하자. for문 2개가 있는데, 첫번째는 a[0]++를 두번 수행하고, 두번째는 b[0]++,b[1]++ 를 순차적으로 실행한다.

	int a[2]={0,0};
	int b[2]={0,0};
	auto start = chrono::high_resolution_clock::now();
	for(int i=0;i<1024*1024*256;i++)
	{
		a[0]++;
		a[0]++;
	}
	auto end = chrono::high_resolution_clock::now();
	cout<<"time(ms) : "<<(end-start).count()/1000000<<endl;
	start = chrono::high_resolution_clock::now();
	for(int i=0;i<1024*1024*256;i++)
	{
		b[0]++;
		b[1]++;
	}
	end = chrono::high_resolution_clock::now();
	cout<<"time(ms) : "<<(end-start).count()/1000000<<endl;

이 결과는 어떻게 나올까? 놀랍게도 상당한 차이가 난다. 두번째 for문이 첫번째 루프보다 약 1.8배 빠르다.

그 이유는 첫번째 코드인 a[0]++, a[0]++ 는 의존성이 존재한다. 이전 코드가 실행되지 않으면 cpu는 다음 명령어를 실행할 수 없다. 따라서 파이프라인에서 최적화되기 힘들다. 그러나 두번째 코드는 의존성이 없기 때문에, cpu가 어느정도 병렬로 실행이 가능하다. 따라서 이러한 결과가 나오는 것이다.