정글 45일차

윤종성·2024년 8월 14일

동적 메모리 할당기

c언어

목록 보기

4/4

malloc-lab 구현

1. 오류 해결 과정

묵시적 가용 리스트, naive 명시적 가용 리스트 구현 중 생긴 오류는 너무 많아 기억도 안 나 패스.

분리 가용 리스트(그 중 segregated-fit) 구현 과정에서 발생한 오류만 정리해 봤다.

1-1. sbrk 오류

-> 힙을 확장을 너무 많이해서 최대 힙 크기를 넘아가면 발생한다.
-> free 또는 적절한 할당이 이루어지지 않아 외부 단편화가 계속 발생하거나, 할당할 수 없는 가용블록들이 많아져서 생긴다.
-> free, 할당 절차가 잘 이루어지는 지 다시 확인한다.

연결부분에서 연결 후 블록의 크기가 변하면 새로 리스트를 찾아 넣어줘야 하는데 안함
place함수에서 분할 후 작아진 조각을 안 넣어줌
find-fit 함수에서 마지막 리스트(제일 큰 블록 리스트)를 검사를 안해서 계속 추가 힙을 요구하게 됨

for (; bp == NULL && exp < N_CLASS-1; exp++) // 전
for (; bp == NULL && exp < N_CLASS; exp++) // 후

1-2. 분리 가용 리스트 속도가 너무 느리다

명시적 가용 리스트보다 분리 가용 리스트는 속도가 빨라야 하는 것이 정상인데 오히려 더 느려지는 현상이 나타났다.

효율성은 초당 명령어 수행 수로 계산하는데
(총 명령어 수)/(총 소요시간)으로 계산하므로, 소요시간이 오래 걸리는 작업이 주는 영향이 크다.
소요시간이 오래 걸리는 부분을 살펴 봤다.

개선 전

trace	valid	util	ops	secs	Kops
0	yes	99%	5694	0.000233	24438
1	yes	97%	5848	0.000242	24155
2	yes	98%	6648	0.000284	23408
3	yes	99%	5380	0.000223	24115
4	yes	66%	14400	0.000313	46036
5	yes	93%	4800	0.000329	14581
6	yes	90%	4800	0.000335	14328
7	yes	54%	12000	0.000323	37163
8	yes	47%	24000	0.149107	161
9	yes	31%	14401	0.140725	102
10	yes	37%	14401	0.003863	3728
Total		74%	112372	0.295977	380

개선1. (trace 8)분리 가용 리스트를 위해 새로 생긴 작업(find_class)에 문제가 있을 것이다.

사이즈를 2씩 나눠서 구하는 것보다

int find_class(size_t size) {
    int pow = 0;

    while (size > 1 && pow < N_CLASS-1) {
        size /= 2;
        pow++;
    }

    return pow;
}

i를 2씩 곱해서 구하는 것이 훨씬 빠르다.
GPT가 위에 방법으로 하래서 변수 하나 덜 쓰니까 좋겠지 하고 썼던 건데 당했다..

int find_class(size_t size) {
    int pow = 0, i = 1;

    while (size > i && pow < N_CLASS-1) {
        i = i << 1;
        pow++;
    }

    return pow;
}

trace	valid	util	ops	secs	Kops
0	yes	96%	5694	0.000253	22470
1	yes	93%	5848	0.000296	19770
2	yes	97%	6648	0.000321	20743
3	yes	98%	5380	0.000242	22259
4	yes	66%	14400	0.000376	38318
5	yes	92%	4800	0.000408	11762
6	yes	90%	4800	0.000393	12204
7	yes	54%	12000	0.000337	35640
8	yes	47%	24000	0.000561	42796
9	yes	32%	14401	0.131994	109
10	yes	27%	14401	0.003992	3608
Total		72%	112372	0.139172	807

개선2. (trace 10) realloc을 테스트하는 trace에서 속도 저하가 심하다. realloc을 고치자.

기본으로 구현된 함수는 realloc을 무조건 블록을 새로 할당하고 옮겨준다.(free & alloc)
다시 할당하는 작업이 연결 등 처리시간이 기므로 새로 할당하는 경우를 줄여줘야 한다.
작을 때에는 작업을 하지 않고,
클 때에는 확장이 가능(뒤에 가용블록이 있어서)한 경우에는 제자리에서 확장하도록 고쳤다.

그 과정에서 문제가 또 발생했다:
1. size를 줄이는 상황인데도 확장을 시도했다.
기존 블록에서 GET_SIZE로 얻을 수 있는 크기는 오버헤드를 포함한 블록 전체의 크기이고, 인자로 받은size는 페이로드 기준 크기이므로 직접 비교하면 안 된다.
2. heap 바깥으로 페이로드가 벗어났다.
마찬가지로 old_size는 블록 기준, size는 페이로드 기준 크기이므로 이를 통해 증가시야 할 크기 dsize를 계산하면 산술 오버플로우가 발생해서 매우 큰 크기로 블록을 확장시버릴 수 있다.

trace	valid	util	ops	secs	Kops
0	yes	97%	5694	0.000233	24448
1	yes	98%	5848	0.000235	24864
2	yes	98%	6648	0.000281	23642
3	yes	98%	5380	0.000231	23341
4	yes	66%	14400	0.000338	42654
5	yes	93%	4800	0.000335	14341
6	yes	90%	4800	0.000336	14269
7	yes	55%	12000	0.000318	37736
8	yes	51%	24000	0.000541	44403
9	yes	33%	14401	0.057987	248
10	yes	29%	14401	0.000327	44026
Total		73%	112372	0.061161	1837

Perf index = 44 (util) + 40 (thru) = 84/100

개선3. (trace 9) 여전히 느리다.

trace 9은 큰 할당이 많이 일어난다. 크기 클래스의 수를 늘려보자

20개

trace	valid	util	ops	secs	Kops
0	yes	98%	5694	0.000237	24056
1	yes	97%	5848	0.000266	21993
2	yes	98%	6648	0.000302	22013
3	yes	99%	5380	0.000228	23576
4	yes	66%	14400	0.000355	40541
5	yes	93%	4800	0.000349	13765
6	yes	90%	4800	0.000354	13552
7	yes	55%	12000	0.000333	36090
8	yes	51%	24000	0.000561	42750
9	yes	38%	14401	0.000736	19556
10	yes	29%	14401	0.000336	42898
Total		74%	112372	0.004057	27699

Perf index = 44 (util) + 40 (thru) = 84/100

30개

trace	valid	util	ops	secs	Kops
0	yes	98%	5694	0.000260	21934
1	yes	97%	5848	0.000256	22844
2	yes	98%	6648	0.000319	20834
3	yes	99%	5380	0.000248	21720
4	yes	65%	14400	0.000348	41379
5	yes	93%	4800	0.000375	12810
6	yes	90%	4800	0.000354	13548
7	yes	55%	12000	0.000354	33937
8	yes	51%	24000	0.000672	35693
9	yes	38%	14401	0.000730	19717
10	yes	29%	14401	0.000327	44067
Total		74%	112372	0.004243	26487

	Perf index = 44 (util) + 40 (thru) = 84/100

알고보니 처리량 만점이 40점이다. 이용도를 조금더 늘리기 위해 best-fit 검색을 사용하자
Results for mm malloc:

trace	valid	util	ops	secs	Kops
0	yes	99%	5694	0.000248	22987
1	yes	99%	5848	0.000245	23879
2	yes	99%	6648	0.000282	23600
3	yes	99%	5380	0.000233	23051
4	yes	66%	14400	0.000345	41691
5	yes	96%	4800	0.000523	9181
6	yes	95%	4800	0.000504	9533
7	yes	55%	12000	0.000364	32931
8	yes	51%	24000	0.000586	40921
9	yes	28%	14401	0.000914	15754
10	yes	35%	14401	0.000328	43892
Total		75%	112372	0.004573	24576

Perf index = 45 (util) + 40 (thru) = 85/100

trace 9은 오히려 이용도가 줄었다.
테스트 케이스를 살펴보니 같은 블록을 계속해서 증가시키며 realloc을 요청한다.
오히려 딱 맞는 블록에 할당해서 크기를 늘릴 때마다 새로운 블록으로 이동하며 외부 단편화가 증가한 것 같다.

first-fit과 주소 순서 리스트를 이용해서 빠르게 best-fit의 효과를 얻어보자 (왜 주소 순서로 배치된 first-fit 방식은 이용도가 높은가?)

Trace	Valid	Util	Ops	Secs	Kops
0	yes	98%	5694	0.000249	22858
1	yes	97%	5848	0.000269	21740
2	yes	98%	6648	0.000309	21535
3	yes	99%	5380	0.000247	21808
4	yes	66%	14400	0.000320	44944
5	yes	93%	4800	0.000396	12130
6	yes	90%	4800	0.000385	12480
7	yes	55%	12000	0.000348	34473
8	yes	51%	24000	0.000543	44215
9	yes	38%	14401	0.000789	18250
10	yes	29%	14401	0.000326	44161
Total		74%	112372	0.004180	26881

Perf index = 44 (util) + 40 (thru) = 84/100

보너스. trace 4는 4kb보다 살짝 큰 블록을 반복적으로 할당한다.

페이지 크기(chunksize)를 4kb로 설정해 두었으므로 최초 힙 추가 시 딱맞게 할당해주면 이용도가 99%가 된다.

Trace	Valid	Util	Ops	Secs	Kops
0	yes	99%	5694	0.000275	20728
1	yes	99%	5848	0.000262	22346
2	yes	99%	6648	0.000296	22437
3	yes	99%	5380	0.000227	23742
4	yes	98%	14400	0.000313	46036
5	yes	96%	4800	0.000527	9103
6	yes	95%	4800	0.000545	8815
7	yes	55%	12000	0.000363	33040
8	yes	51%	24000	0.000596	40282
9	yes	28%	14401	0.000895	16092
10	yes	35%	14401	0.000329	43812
Total		78%	112372	0.004626	24289

Perf index = 47 (util) + 40 (thru) = 87/100

2. 정리

명시적 가용 리스트를 구현. 분리 가용 리스트 중 분리 맞춤(segregated-fit) 방식을 사용했다.
크기 클래스는 2의 지수로 구성했다. 그 개수는 18개로 설정했다.(테스트 케이스에 크기가 큰 블록 할당 요청이 많음)
best-fit이 점수에는 유리하나 잃는 속도에 비해 이용도 개선이 크지 않아서 first-fit에 대신 주소 정렬 방식을 이용했다.
분리 가용 리스트에서 선임자와 후임자의 참조는 구조체 참조를 사용하면 편리하다.

// 매크로 등을 이용할 경우 방식
PUT(*(size_t **)((char *)bp + 4), (*(size_t **)bp));

// 구조체 참조를 이용할 경우
typedef struct _node{
    struct _node *prev;
    struct _node *next;
} Node;

((Node *)bp)->next->prev = ((Node *)bp)->prev;

읽기도 쉽고 사용하기도 편하다.
구조체도 역시 연속된 바이트로 구성된 새로운 자료형이라는 것을 생각한다면 작동 방식은 알기 쉽다.

3. 전체 코드

/*
 * mm-naive.c - The fastest, least memory-efficient malloc package.
 * 
 * In this naive approach, a block is allocated by simply incrementing
 * the brk pointer.  A block is pure payload. There are no headers or
 * footers.  Blocks are never coalesced or reused. Realloc is
 * implemented directly using mm_malloc and mm_free.
 *
 * NOTE TO STUDENTS: Replace this header comment with your own header
 * comment that gives a high level description of your solution.
 */
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>

#include "mm.h"
#include "memlib.h"
// #include "memlib.c"

team_t team = {
    /* Team name */
    "ateam",
    /* First member's full name */
    "Harry Bovik",
    /* First member's email address */
    "bovik@cs.cmu.edu",
    /* Second member's full name (leave blank if none) */
    "",
    /* Second member's email address (leave blank if none) */
    ""
};

#define ALIGN 8 // alignment condition
#define HEAD_SIZE 4 // amount of header/footer
#define CHUNKSIZE (1<<12) // extend heap by this amount (bytes)
#define ALIGNED_SIZE(size) ((size + HEAD_SIZE + HEAD_SIZE + ALIGN-1) & ~(ALIGN-1)) // required block size(header+payload+footer). round up to alignment condition.
#define MIN_SIZE ALIGNED_SIZE(1+2*HEAD_SIZE)
#define MAX(x, y) ((x) > (y) ? (x) : (y))
#define GET(p) (*(size_t *)(p))
#define PUT(p, val) (*(size_t *)(p) = (val))

#define ADDR_HEAD(bp) ((char *)(bp) - HEAD_SIZE) // get address of header

#define GET_SIZE(bp) (GET(ADDR_HEAD(bp)) & ~(ALIGN-1))
#define IS_ALLOC(bp) (GET(ADDR_HEAD(bp)) & 0x1)

#define PREV_BLOCK(bp) ((char *)(bp) - (GET((char *)(bp) - 2*HEAD_SIZE) & ~(ALIGN-1)))
#define NEXT_BLOCK(bp) ((char *)(bp) + GET_SIZE(bp))

#define ADDR_FOOT(p) ((char *)(p) + GET_SIZE(p) - HEAD_SIZE - HEAD_SIZE) // get address of footer

#define N_CLASS 18 // # of size classes

typedef struct _node{
    struct _node *prev;
    struct _node *next;
} Node;

static char *heap_listp;
// 분리 가용 리스트들의 헤드 노드 배열
static Node *segregated_list;
static void *coalesce(Node *bp);
static void *extend_heap(size_t words);

// 연결 리스트에서 블록을 삭제
void disconnect_block(Node *bp) {
    bp->prev->next = bp->next;
    if (bp->next != NULL)
        bp->next->prev = bp->prev;
}

// 연결 리스트에 블록을 끼워넣기
void connect_block(Node *pre, Node *bp) {
    if (pre->next != NULL)
        pre->next->prev = bp;
    bp->next = pre->next;
    pre->next = bp;
    bp->prev = pre;
}

// 크기 클래스를 반환하는 함수
// log2(words)를 올림한 수를 반환함 (2->1, 3->2, 4->2, 5->3, ...)
int find_class(size_t words) {
    int pow = 0, i = 1;
    
    while (words > i && pow < N_CLASS-1) {
        i = i << 1;
        pow++;
    }
    return pow;
}

//////////////////////////////////////////////////////////////////////////////////////////

int mm_init(void)
{
    size_t size = ALIGNED_SIZE(N_CLASS*2*HEAD_SIZE);

    heap_listp = mem_sbrk(size+2*HEAD_SIZE);
    if (heap_listp == (void *)-1) return -1;

    heap_listp += ALIGN - HEAD_SIZE;

    PUT(heap_listp, size+1); // prologue header
    PUT(heap_listp+size-HEAD_SIZE, size+1); // prologue footer
    PUT(heap_listp+size, 1); // epilogue header

    heap_listp += HEAD_SIZE; // points prologue payload

    
    // 프롤로그 블록에 크기 클래스 별 선임자/후임자 포인터를 저장하고 사용함
    // 크기 클래스 3의 헤더 노드에 접근하기 위해서는 (segregated_list + 3)사용
    segregated_list = (Node *)heap_listp;

    for (int i = 0; i < N_CLASS; i++) {
        (segregated_list[i]).prev = NULL;
        (segregated_list[i]).next = NULL;
    }

    if (extend_heap(CHUNKSIZE/HEAD_SIZE) == NULL)
        return -1;

    return 0;
}

static void *extend_heap(size_t words) {
    void *bp;
    size_t size = (words%2) ? (words+1) * HEAD_SIZE : words * HEAD_SIZE;

    bp = mem_sbrk(size);
    if (bp == (void *)-1) return NULL;

    PUT(ADDR_HEAD(bp), size); // header of new free block
    PUT(ADDR_FOOT(bp), size); // footer of new free block
    PUT(ADDR_HEAD(NEXT_BLOCK(bp)), 1); // header of epilogue block

    connect_block(segregated_list + find_class(size/ALIGN), (Node *)bp);

    return coalesce(bp);
}

static void *coalesce(Node *bp) {
    Node *prev_bp = (Node *)PREV_BLOCK(bp);
    Node *next_bp = (Node *)NEXT_BLOCK(bp);
    size_t size = GET_SIZE(bp);

    if (!IS_ALLOC(next_bp)) {
        size += GET_SIZE(next_bp);
        disconnect_block(next_bp);
        PUT(ADDR_HEAD(bp), size);
        PUT(ADDR_FOOT(bp), size);
    }
    if (!IS_ALLOC(prev_bp)) {
        size += GET_SIZE(prev_bp);
        disconnect_block(bp);
        PUT(ADDR_HEAD(prev_bp), size);
        PUT(ADDR_FOOT(prev_bp), size);
        bp = prev_bp;
    }

    // (bp의 크기가 변경될 수 있으므로)
    // 올바른 크기 클래스로 넣어주기 위해 기존 블록을 리스트에서 삭제하고 다시 넣음
    disconnect_block(bp);
    connect_block(segregated_list+find_class(size/ALIGN), bp);

    return bp;
}

static void *find_fit(size_t asize) {
    Node *bp = NULL;
    int exp = find_class(asize/ALIGN);

    // first-fit
    for (; bp == NULL && exp < N_CLASS; exp++) {
        bp = (segregated_list + exp)->next;
        while (bp != NULL && GET_SIZE(bp) < asize)
            bp = bp->next;
    }

    return bp;

    // // best-fit
    // Node *smallest_bp = NULL;
    // size_t size, smallest_size = -1;
    // for (; smallest_bp == NULL && exp < N_CLASS; exp++) {
    //     bp = (segregated_list + exp)->next;
    //     while (bp != NULL) {
    //         size = GET_SIZE(bp);
    //         // 블록을 넣을 수 있는 경우
    //         // 딱 맞는 경우 바로 반환
    //         if (size == asize)
    //             return bp;
    //         // 블록이 넉넉하고 최소 블록 크기인 경우 갱신
    //         else if (size > asize && size < smallest_size) {
    //             smallest_bp = bp;
    //             smallest_size = size;
    //         }
    //         else
    //             bp = bp->next;
    //     }
    // }
    
    // return smallest_bp;
}

static void place(Node *bp, size_t asize) {
    size_t old_size;
    Node *next;

    // 배치할 가용 블록이 충분히 큰 경우 분할한다.
    if (asize < (old_size = GET_SIZE(bp)) - MIN_SIZE) {
        disconnect_block(bp);
        PUT(ADDR_HEAD(bp), asize+1);
        PUT(ADDR_FOOT(bp), asize+1);

        next = (Node *)(((char *)bp) + asize);
        PUT(ADDR_HEAD(next), old_size-asize);
        PUT(ADDR_FOOT(next), old_size-asize);
        connect_block(segregated_list + find_class((old_size-asize)/ALIGN), next);
    }
    // 크지 않은 경우 분할하지 않고 가용블록 전체를 할당한다.
    else {
        disconnect_block(bp);
        PUT(ADDR_HEAD(bp), old_size+1);
        PUT(ADDR_FOOT(bp), old_size+1);
    }
}

void *mm_malloc(size_t size)
{
    size_t asize; // adjusted size
    char *bp;
    
    if (size <= 0)
        return NULL;

    asize = ALIGNED_SIZE(size); // ALIGNED_SIZE 매크로는 payload 크기가 주어지면 정렬조건에 맞는 블록의 크기를 계산한다.

    if ((bp = find_fit(asize)) == NULL) {
        bp = extend_heap(MAX(asize, CHUNKSIZE)/HEAD_SIZE);
        if (bp == NULL)
            return NULL;
    }

    place((Node *)bp, asize);
    return bp;
}

void mm_free(void *bp)
{
    size_t size = GET_SIZE(bp);
    Node *size_class = segregated_list + find_class((size/ALIGN));

    PUT(ADDR_HEAD(bp), size);
    PUT(ADDR_FOOT(bp), size);
    while ((size_t)(size_class->next) > (size_t)bp) size_class = size_class->next; // sort list by address. use with first-fit
    connect_block(size_class, (Node *)bp); // 새로 할당한 블록을 삽입

    coalesce(bp);
}

void *mm_realloc(void *ptr, size_t size)
{
    void *oldptr = ptr;
    void *newptr;
    void *next_block = NEXT_BLOCK(ptr);
    size_t old_size = GET_SIZE(ptr);
    size_t dsize;
    size_t copySize;
    
    // 크기를 줄이도록 요청한 경우
    // 내부 단편화를 감수하더라도 블록 유지
    if (ALIGNED_SIZE(size) <= old_size) {
        return oldptr;
    }

    // 뒤에 충분히 확장할 수 있을 크기의 가용블록이 있는 경우: 제자리에서 확장
    dsize = ALIGNED_SIZE(size)-old_size;
    if (!IS_ALLOC(next_block) && dsize <= GET_SIZE(next_block)) {
        place(next_block, dsize);
        size = GET_SIZE(next_block);

        PUT(ADDR_HEAD(oldptr), old_size+size+1);
        PUT(ADDR_FOOT(oldptr), old_size+size+1);

        return oldptr;
    }
    // 제자리 확장이 불가능한 경우에만 새로 할당 & 기존블록 반환
    else {
        newptr = mm_malloc(size);
        if (newptr == NULL)
            return NULL;
        copySize = *(size_t *)((char *)oldptr - sizeof(size_t));
        if (size < copySize)
            copySize = size;
        memcpy(newptr, oldptr, copySize);
        mm_free(oldptr);
        return newptr;
    }
}

윤종성

알을 깬 개발자

이전 포스트