Pintos Project1-2 BSD

이재원·2024년 3월 8일

PintOS

목록 보기

1/2

Advanced Scheduler

Implement a multilevel feedback queue scheduler similar to the 4.4BSD scheduler to reduce the average response time for running jobs on your system.

Like the priority scheduler, the advanced scheduler chooses the thread to run based on priorities. However, the advanced scheduler does not do priority donation. Thus, we recommend that you have the priority scheduler working, except possibly for priority donation, before you start work on the advanced scheduler.

You must write your code to allow us to choose a scheduling algorithm policy at Pintos startup time. By default, the priority scheduler must be active, but we must be able to choose the 4.4BSD scheduler with the -mlfqs kernel option. Passing this option sets thread_mlfqs, declared in threads/thread.h, to true when the options are parsed by parse_options(), which happens early in main().

When the 4.4BSD scheduler is enabled, threads no longer directly control their own priorities. The priority argument to thread_create() should be ignored, as well as any calls to thread_set_priority(), and thread_get_priority() should return the thread's current priority as set by the scheduler. The advanced scheduler is not used in any later project.

Advanced Scheduler

4.4BSD scheduler와 유사한 다중 레벨 피드백 큐 스케줄러를 구현하여 시스템에서 작업을 실행하는 데 필요한 평균 응답 시간을 줄입니다.

고급 스케줄러는 우선순위 스케줄러와 마찬가지로 우선순위에 따라 실행할 스레드를 선택한다.
- 단, 고급 스케줄러에서는 우선기부를 하지 않는다.
- 고급 스케줄러 작업을 시작하기 전에 우선순위 기부를 제외하고 우선순위 스케줄러가 작동하도록 하는것이 좋다.
PintOS시작시 스케줄링 알고리즘을 선택할 수 있도록 코드를 작성해야한다.
기본적으로 우선순위 스케줄러는 활성화 되어있어야 하지만, 커널 옵션으로 4.4BSD 스케줄러를 선택할 수 있어야한다.
-main()시작할 때, mlfqs에서 선언된 옵션세트parse_options()를 전달하면 thread_mlfqs에서 thread/thread.h옵션을 구문 분석할 때, true로 설정된다.
4.4BSD 스케줄러가 활성화 되면 스레드는 더 이상 자신의 우선순위를 직접 제어하지 않는다.
thread_create()의 우선순위 인수는 무시되어야 하며, thread_set_priority() 및 thread_get_priority()호출은 스케줄러가 설정한 스레드의 현재 우선순위를 반환해야한다.
고급 스케줄러는 이후 프로젝트에서 사용되지 않는다.

4.4BSD Scheduler

The goal of a general-purpose scheduler is to balance threads' different scheduling needs. Threads that perform a lot of I/O require a fast response time to keep input and output devices busy, but need little CPU time. On the other hand, compute-bound threads need to receive a lot of CPU time to finish their work, but have no requirement for fast response time. Other threads lie somewhere in between, with periods of I/O punctuated by periods of computation, and thus have requirements that vary over time. A well-designed scheduler can often accommodate threads with all these requirements simultaneously.

For project 1, you must implement the scheduler described in this appendix. Our scheduler resembles the one described in [McKusick], which is one example of a multilevel feedback queue scheduler. This type of scheduler maintains several queues of ready-to-run threads, where each queue holds threads with a different priority. At any given time, the scheduler chooses a thread from the highest-priority non-empty queue. If the highest-priority queue contains multiple threads, then they run in "round robin" order.

Multiple facets of the scheduler require data to be updated after a certain number of timer ticks. In every case, these updates should occur before any ordinary kernel thread has a chance to run, so that there is no chance that a kernel thread could see a newly increased timer_ticks() value but old scheduler data values.

The 4.4BSD scheduler does not include priority donation.

4.4BSD 스케줄러

범용 스케줄러의 목표는 스레드의 다양한 스케줄링 요구 사항의 균형을 맞추는 것이다.
많은 입출력을 수행하는 스레드는 입력 및 출력 장치를 바쁘게 유지하기 위해 빠른 응답 시간이 필요하지만 CPU 시간은 거의 필요하지 않는다.
반면에 컴퓨팅에 바인딩된 스레드는 작업을 완료하기 위해 많은 CPU 시간이 필요하지만 빠른 응답 시간은 필요하지 않습니다.
다른 스레드는 이 둘의 중간에 위치하며, I/O 주기가 계산 주기로 구분되므로 시간에 따라 요구 사항이 달라집니다. 잘 설계된 스케줄러는 이러한 모든 요구 사항을 가진 스레드를 동시에 수용할 수 있는 경우가 많습니다.

프로젝트 1의 경우 이 부록에 설명된 스케줄러를 구현해야 합니다. 이 스케줄러는 다단계 피드백 대기열 스케줄러의 한 예인 [McKusick]에 설명된 것과 유사합니다. 이 유형의 스케줄러는 실행 준비가 완료된 스레드의 여러 대기열을 유지하며, 각 대기열에는 우선순위가 다른 스레드가 보관됩니다. 스케줄러는 언제든지 우선순위가 가장 높은 비어 있지 않은 대기열에서 스레드를 선택합니다. 우선순위가 가장 높은 대기열에 여러 개의 스레드가 포함되어 있으면 스레드가 '라운드 로빈' 순서로 실행됩니다.

스케줄러의 여러 측면에서는 특정 수의 타이머 틱 후에 데이터를 업데이트해야 합니다. 모든 경우에 이러한 업데이트는 일반 커널 스레드가 실행될 기회를 갖기 전에 이루어져야 커널 스레드가 새로 증가한 timer_ticks() 값 대신 이전 스케줄러 데이터 값을 볼 수 있는 가능성이 없습니다.

4.4BSD 스케줄러에는 우선순위 기부가 포함되어 있지 않습니다.

Niceness

Thread priority is dynamically determined by the scheduler using a formula given below. However, each thread also has an integer nice value that determines how "nice" the thread should be to other threads. A nice of zero does not affect thread priority. A positive nice, to the maximum of 20, decreases the priority of a thread and causes it to give up some CPU time it would otherwise receive. On the other hand, a negative nice, to the minimum of -20, tends to take away CPU time from other threads.

The initial thread starts with a nice value of zero. Other threads start with a nice value inherited from their parent thread. You must implement the functions described below, which are for use by test programs. We have provided skeleton definitions for them in threads/thread.c

int thread_get_nice (void);

Returns the current thread's nice value.

int thread_set_nice (int nice);

Sets the current thread's nice value to new nice and recalculates the thread's priority based on the new value (see Calculating Priority). If the running thread no longer has the highest priority, yields.

Niceness

스레드 우선순위는 아래 제공된 공식을 사용하여 스케줄러에 의해 동적으로 결정됩니다. 그러나 각 스레드에는 스레드가 다른 스레드에 대해 얼마나 "좋은"지 결정하는 정수 nice 값도 있습니다. 0이라는 좋은 값은 스레드 우선순위에 영향을 주지 않습니다. 최대 20까지 긍정적인 nice는 스레드의 우선 순위를 낮추고 스레드가 받을 수 있는 일부 CPU 시간을 포기하게 만듭니다. 반면에 최소 -20의 음수 nice는 다른 스레드에서 CPU 시간을 빼앗는 경향이 있습니다.

초기 스레드는 좋은 값인 0으로 시작합니다. 다른 스레드는 상위 스레드에서 상속된 멋진 값으로 시작합니다. 테스트 프로그램에서 사용하기 위한 아래 설명된 기능을 구현해야 합니다. 우리는 이에 대한 뼈대 정의를 제공했습니다.threads/thread.c

int thread_get_nice (void);

현재 스레드의 nice 값을 반환합니다.

int thread_set_nice (int nice);

현재 스레드의 nice 값을 새 nice로 설정하고 새 값을 기반으로 스레드의 우선순위를 다시 계산합니다( 우선순위 계산 참조 ). 실행 중인 스레드가 더 이상 가장 높은 우선순위를 갖지 않으면 항복합니다.

Calculating Priority

Our scheduler has 64 priorities and thus 64 ready queues, numbered 0 (PRI_MIN) through 63 (PRI_MAX). Lower numbers correspond to lower priorities, so that priority 0 is the lowest priority and priority 63 is the highest. Thread priority is calculated initially at thread initialization. It is also recalculated once every fourth clock tick, for every thread. In either case, it is determined by the formula:

priority = PRI_MAX - (recent_cpu / 4) - (nice * 2),

where recent cpu is an estimate of the CPU time the thread has used recently (see below) and nice is the thread's nice value. The result should be rounded down to the nearest integer (truncated). The coefficients 1/4 and 2 on recent cpu and nice, respectively, have been found to work well in practice but lack deeper meaning. The calculated priority is always adjusted to lie in the valid range PRI_MIN to PRI_MAX.

This formula gives a thread that has received CPU time recently lower priority for being reassigned the CPU the next time the scheduler runs. This is key to preventing starvation: a thread that has not received any CPU time recently will have a recent cpu of 0, which barring a high nice value should ensure that it receives CPU time soon.

Calculating Priority

스케줄러에는 64개의 우선 순위가 있으므로 0 (PRI_MIN) 부터 까지 번호가 매겨진 64개의 준비 대기열이 있습니다 63 (PRI_MAX). 낮은 숫자는 낮은 우선순위에 해당하므로 우선순위 0이 가장 낮은 우선순위이고 우선순위 63이 가장 높습니다. 스레드 우선순위는 스레드 초기화 시 처음에 계산됩니다. 또한 모든 스레드에 대해 4번째 클록 틱마다 한 번씩 다시 계산됩니다. 두 경우 모두 다음 공식에 의해 결정됩니다.

우선순위 = PRI_MAX - (recent_cpu / 4) - (nice * 2),

여기서 recent cpu는 스레드가 최근에 사용한 CPU 시간의 추정치(아래 참조)이고 nice는 스레드의 nice 값입니다. 결과는 가장 가까운 정수로 반올림되어야 합니다(잘림). recent cpu와 nice의 계수 1/4와 2는 실제로는 잘 작동하는 것으로 밝혀졌지만 더 깊은 의미는 부족합니다. 계산된 우선순위는 항상 유효한 범위에 있도록 조정 됩니다 PRI_MIN to PRI_MAX.

이 공식은 최근에 CPU 시간을 받은 스레드에 스케줄러가 다음에 실행될 때 CPU를 다시 할당하기 위해 낮은 우선순위를 제공합니다. 이것이 기아를 방지하는 열쇠입니다. 최근에 CPU 시간을 받지 못한 스레드는 recent_cpu가 0이 되며, 높은 nice 값을 제외하면 곧 CPU 시간을 받을 수 있습니다.

Calculating `recent_cpu`

We wish recent cpu to measure how much CPU time each process has received "recently." Furthermore, as a refinement, more recent CPU time should be weighted more heavily than less recent CPU time. One approach would use an array of n elements to track the CPU time received in each of the last n seconds. However, this approach requires O(n) space per thread and O(n) time per calculation of a new weighted average.

Instead, we use a exponentially weighted moving average, which takes this general form:

x(0) = f(0),

x(t) = ax(t − 1) + (1 − a)f(t),

a = k/(k + 1),

where x(t) is the moving average at integer time t ≥ 0, f(t) is the function being averaged, and k > 0 controls the rate of decay. We can iterate the formula over a few steps as follows:

x(1) = f(1),

x(2) = af(1) + f(2),

...

x(5) = a^4 f(1) + a^3 f(2) + a^2 f(3) + a f(4) + f(5).

The value of f(t) has a weight of 1 at time t, a weight of a at time t + 1, a^2 at time t + 2, and so on. We can also relate x(t) to k: f(t) has a weight of approximately 1/e at time t + k, approximately 1/(e^2) at time t + 2k, and so on. From the opposite direction, f(t) decays to weight w at time t + log_a(w).

The initial value of recent_cpu is 0 in the first thread created, or the parent's value in other new threads. Each time a timer interrupt occurs, recent_cpu is incremented by 1 for the running thread only, unless the idle thread is running. In addition, once per second the value of recent cpu is recalculated for every thread (whether running, ready, or blocked), using this formula:

recent_cpu = (2 load_avg)/(2 load_avg + 1) * recent_cpu + nice

where loadavg is a moving average of the number of threads ready to run (see below). If load_avg is 1, indicating that a single thread, on average, is competing for the CPU, then the current value of recent cpu decays to a weight of .1 in log(2/3) .1 ≈ 6 seconds; if load avg is 2, then decay to a weight of .1 takes log_(3/4) .1 ≈ 8 seconds. The effect is that recent cpu estimates the amount of CPU time the thread has received "recently," with the rate of decay inversely proportional to the number of threads competing for the CPU.

Assumptions made by some of the tests require that these recalculations of recent cpu be made exactly when the system tick counter reaches a multiple of a second, that is, when timer_ticks () % TIMER_FREQ == 0, and not at any other time.

The value of recent_cpu can be negative for a thread with a negative nice value. Do not clamp negative recent_cpu to 0.

You may need to think about the order of calculations in this formula. We recommend computing the coefficient of recent cpu first, then multiplying. Some students have reported that multiplying load_avg by recent_cpu directly can cause overflow.

You must implement thread_get_recent_cpu(), for which there is a skeleton in threads/thread.c.

int thread_get_recent_cpu (void);

Returns 100 times the current thread's recent cpu value, rounded to the nearest integer

Calculating `recent_cpu`

우리는 recent_cpu가 각 프로세스가 "최근" 받은 CPU 시간을 측정하기를 원합니다. 또한 개선 사항으로 최신 CPU 시간은 최근 CPU 시간보다 더 많은 가중치를 부여해야 합니다. 한 가지 접근 방식은 n 요소의 배열을 사용하여 마지막 n초마다 수신된 CPU 시간을 추적하는 것입니다. 그러나 이 접근 방식에는 스레드당 O(n) 공간과 새로운 가중 평균 계산당 O(n) 시간이 필요합니다.

대신에 우리는 다음과 같은 일반적인 형식을 취하는 지수 가중 이동 평균을 사용합니다.

x(0) = f(0),

x(t) = ax(t − 1) + (1 − a)f(t),

a = k/(k + 1),

여기서 x(t)는 정수 시간의 이동 평균이고 t ≥ 0, f(t)는 평균화되는 함수이며 k > 0는 감쇠 속도를 제어합니다. 다음과 같이 몇 단계에 걸쳐 공식을 반복할 수 있습니다.

x(1) = f(1),

x(2) = af(1) + f(2),

...

x(5) = a^4 f(1) + a^3 f(2) + a^2 f(3) + a f(4) + f(5).

의 값은 f(t)time 에서 1의 가중치를 가지며 , at time , at time 등 t의 가중치를 갖습니다. 우리는 또한 다음과 관련될 수 있습니다 : 대략 적인 시간 , 대략적인 시간 등 의 가중치를 가집니다 . 반대 방향에서는 시간에 따라 무게가 감소합니다 .at + 1a^2t + 2x(t)kf(t)1/et + k1/(e^2)t + 2kf(t)wt + log_a(w)

생성된 첫 번째 스레드에서는 초기 값이 recent_cpu0이고 다른 새 스레드에서는 상위 값입니다. 타이머 인터럽트가 발생할 때마다 recent_cpu유휴 스레드가 실행 중이 아닌 이상 실행 중인 스레드에 대해서만 1씩 증가됩니다. 또한 다음 공식을 사용하여 초당 한 번씩 모든 스레드(실행 중, 준비 또는 차단 여부)에 대해 최근 CPU 값이 다시 계산됩니다.

최근_cpu = (2 load_avg)/(2 load_avg + 1) * Recent_cpu + nice

loadavg실행 준비가 된 스레드 수의 이동 평균은 어디에 있습니까(아래 참조). 가 1 이면 load_avg평균적으로 단일 스레드가 CPU를 놓고 경쟁하고 있음을 나타내며 최근 CPU의 현재 값은 초 단위로 가중치가 감소 .1합니다 log(2/3) .1 ≈ 6. 로드 평균이 2이면 .1몇 log_(3/4) .1 ≈ 8초 만에 가중치로 감소됩니다. 그 결과 최근 CPU는 스레드가 "최근" 받은 CPU 시간의 양을 추정하며, 감소율은 CPU를 놓고 경쟁하는 스레드 수에 반비례합니다.

일부 테스트에서는 시스템 틱 카운터가 1초의 배수에 도달할 때, 즉 , timer_ticks () % TIMER_FREQ == 0다른 시간에 도달하지 않을 때 최근 CPU를 재계산해야 한다고 가정합니다.

recent_cpu음수 nice 값을 가진 스레드의 경우 값은 음수일 수 있습니다. recent_cpu음수를 0으로 고정하지 마십시오 .

이 수식의 계산 순서에 대해 생각해야 할 수도 있습니다. 먼저 최근 CPU의 계수를 계산한 다음 곱하는 것이 좋습니다. load_avg일부 학생들은 직접 곱하면 오버플로가 발생할 수 있다고 보고했습니다 recent_cpu.

thread_get_recent_cpu()에 뼈대가 있는 을 구현해야 합니다 threads/thread.c.

int thread_get_recent_cpu (void);

현재 스레드의 최근 CPU 값의 100배를 가장 가까운 정수로 반올림하여 반환합니다.

Calculating `load_avg`

Finally, load avg, often known as the system load average, estimates the average number of threads ready to run over the past minute. Like recent_cpu, it is an exponentially weighted moving average. Unlike priority and recent_cpu, load_avg is system-wide, not thread-specific. At system boot, it is initialized to 0. Once per second thereafter, it is updated according to the following formula:

load_avg = (59/60) load_avg + (1/60) ready_threads,

where ready threads is the number of threads that are either running or ready to run at time of update (not including the idle thread).

Because of assumptions made by some of the tests, load_avg must be updated exactly when the system tick counter reaches a multiple of a second, that is, when timer_ticks () % TIMER_FREQ == 0, and not at any other time. You must implement thread_get_load_avg(), for which there is a skeleton in threads/thread.c.

int thread_get_load_avg (void)

Returns 100 times the current system load average, rounded to the nearest integer.

Calculating `load_avg`

마지막으로, 시스템 로드 평균이라고도 알려진 로드 평균은 지난 몇 분 동안 실행할 준비가 된 평균 스레드 수를 추정합니다. recent_cpu와 마찬가지로 지수가중이동평균입니다. 우선 순위 및 recent_cpu와 달리 load_avg는 스레드 별이 아닌 시스템 전체에 적용됩니다. 시스템 부팅 시 0으로 초기화됩니다. 이후 초당 한 번씩 다음 공식에 따라 업데이트됩니다.

load_avg = (59/60) load_avg + (1/60) Ready_threads,

여기서 준비된 스레드는 업데이트 시 실행 중이거나 실행할 준비가 된 스레드 수입니다(유휴 스레드 제외).

일부 테스트의 가정으로 인해 load_avg는 시스템 틱 카운터가 1초의 배수에 도달할 때, 즉 가 업데이트될 때 정확하게 업데이트되어야 하며 timer_ticks () % TIMER_FREQ == 0다른 시간에는 업데이트되지 않습니다. thread_get_load_avg()에 뼈대가 있는 을 구현해야 합니다 threads/thread.c.

일부 테스트에서 가정한 것 때문에 load_avg는 시스템 틱 카운터가 1초의 배수에 도달할 때, 즉 timer_ticks () % TIMER_FREQ == 0일 때 정확히 업데이트되어야 하며 다른 시간에는 업데이트되지 않아야 합니다. threads/thread.c에 뼈대가 있는 thread_get_load_avg()를 구현해야 합니다.

int thread_get_load_avg (void)

현재 시스템 로드 평균의 100배를 가장 가까운 정수로 반올림하여 반환합니다.

Summary

The following formulas summarize the calculations required to implement the scheduler. They are not a complete description of scheduler requirements.

Every thread has a nice value between -20 and 20 directly under its control. Each thread also has a priority, between 0 (PRI_MIN) through 63 (PRI_MAX), which is recalculated using the following formula every fourth tick:

priority = PRI_MAX - (recent_cpu / 4) - (nice * 2)

recent_cpu measures the amount of CPU time a thread has received "recently." On each timer tick, the running thread's recent cpu is incremented by 1. Once per second, every thread's recent cpu is updated this way:

recent_cpu = (2 load_avg) / (2 load_avg + 1) * recent_cpu + nice

load_avg estimates the average number of threads ready to run over the past minute. It is initialized to 0 at boot and recalculated once per second as follows:

load_avg = (59/60) load_avg + (1/60) ready_threads

where ready_threads is the number of threads that are either running or ready to run at time of update (not including the idle thread).

Summary

다음 수식은 스케줄러를 구현하는 데 필요한 계산을 요약한 것입니다. 이는 스케줄러 요구 사항에 대한 완전한 설명이 아닙니다.

모든 스레드는 -20에서 20 사이의 nice값을 직접 제어할 수 있습니다. 또한 각 스레드에는 0(PRI_MIN)에서 63(PRI_MAX) 사이의 우선 순위가 있으며, 이는 매 4번째 틱마다 다음 공식을 사용하여 다시 계산됩니다:

우선순위 = PRI_MAX - (recent_cpu / 4) - (nice * 2)

recent_cpu는 스레드가 "최근에" 받은 CPU 시간을 측정합니다. 타이머 틱에서 실행 중인 스레드의 recent_cpu는 1씩 증가합니다. 초당 한 번씩 모든 스레드의 최근 CPU는 다음과 같이 업데이트됩니다.

최근_cpu = (2 load_avg) / (2 load_avg + 1) * Recent_cpu + nice

load_avg는 지난 몇 분 동안 실행할 준비가 된 평균 스레드 수를 추정합니다. 부팅 시 0으로 초기화되고 다음과 같이 초당 한 번씩 다시 계산됩니다.

load_avg = (59/60) load_avg + (1/60) Ready_threads

여기서는 ready_threads업데이트 시 실행 중이거나 실행할 준비가 된 스레드 수입니다(유휴 스레드 제외).

Fixed-Point Real Arithmetic

In the formulas above, priority, nice, and ready_threads are integers, but recent_cpu and load_avg are real numbers. Unfortunately, Pintos does not support floating-point arithmetic in the kernel, because it would complicate and slow the kernel. Real kernels often have the same limitation, for the same reason. This means that calculations on real quantities must be simulated using integers. This is not difficult, but many students do not know how to do it. This section explains the basics.

The fundamental idea is to treat the rightmost bits of an integer as representing a fraction. For example, we can designate the lowest 14 bits of a signed 32-bit integer as fractional bits, so that an integer x represents the real number x/(2^14). This is called a 17.14 fixed-point number representation, because there are 17 bits before the decimal point, 14 bits after it, and one sign bit. A number in 17.14 format represents, at maximum, a value of (2^31 − 1)/(2^14) ≈ 131,071.999.

Suppose that we are using a p.q fixed-point format, and let f = 2^q. By the definition above, we can convert an integer or real number into p.q format by multiplying with f. For example, in 17.14 format the fraction 59/60 used in the calculation of load_avg, above, is (59/60)2^14 = 16,110. To convert a fixed-point value back to an integer, divide by f. (The normal / operator in C rounds toward zero, that is, it rounds positive numbers down and negative numbers up. To round to nearest, add f / 2 to a positive number, or subtract it from a negative number, before dividing.)

Many operations on fixed-point numbers are straightforward. Let x and y be fixed-point numbers, and let n be an integer. Then the sum of x and y is x + y and their difference is x - y. The sum of x and n is x + n f; difference, x - n f; product, x * n; quotient, x / n.

Multiplying two fixed-point values has two complications. First, the decimal point of the result is q bits too far to the left. Consider that (59/60)(59/60) should be slightly less than 1, but 16,111 × 16,111 = 259,564,321 is much greater than 2^14 = 16,384. Shifting q bits right, we get 259,564,321/2^14 = 15,842, or about 0.97, the correct answer. Second, the multiplication can overflow even though the answer is representable. For example, 64 in 17.14 format is 64 × 2^14 = 1,048,576 and its square 64^2 = 4,096 is well within the 17.14 range, but 1,048,576^2 = 2^40, greater than the maximum signed 32-bit integer value 2^31 − 1. An easy solution is to do the multiplication as a 64-bit operation. The product of x and y is then ((int64_t) x) * y / f.

Dividing two fixed-point values has opposite issues. The decimal point will be too far to the right, which we fix by shifting the dividend q bits to the left before the division. The left shift discards the top q bits of the dividend, which we can again fix by doing the division in 64 bits. Thus, the quotient when x is divided by y is ((int64_t) x) * f / y.

This section has consistently used multiplication or division by f, instead of q-bit shifts, for two reasons. First, multiplication and division do not have the surprising operator precedence of the C shift operators. Second, multiplication and division are well-defined on negative operands, but the C shift operators are not. Take care with these issues in your implementation.

The following table summarizes how fixed-point arithmetic operations can be implemented in C. In the table, x and y are fixed-point numbers, n is an integer, fixed-point numbers are in signed p.q format where p + q = 31, and f is 1 << q:

Arithmetic	C
Convert n to fixed point	`n * f`
Convert x to integer (rounding toward zero)	`x / f`
Convert x to integer (rounding to nearest)	`(x + f / 2) / f` if `x >= 0` `(x - f / 2) / f` if `x <= 0`
Add x and y	`x + y`
Subtract y from x	`x - y`
Add x and n	`x + n * f`
Subtract n from x	`x - n * f`
Multiply x by y	`((int64_t) x) * y / f`
Multiply x by n	`x * n`
Divide x by y	`((int64_t) x) * f / y`
Divide x by n	`x / n`

실수 고정 소수점 연산

priority, nice및 ready_threads는 정수이지만 recent_cpu및 load_avg는 실수다.
커널이 복잡해지고 느려지기 때문에 Pintos는 커널에서 부동 소수점 연산을 지원하지 않는다.
실제 커널에서도 같은 이유로 커널에서 부동 소수점 연산을 지원하지 않는 경우가 있다. 즉, 정수를 사용하여 수를 계산해야한다.

기본적인 아이디어는 정수의 가장 오른쪽 비트를 분수를 나타내는 것으로 처리하는 것입니다. 예를 들어, 부호 있는 32비트 정수의 최하위 14비트를 분수 비트로 지정하여 정수 x가 실수 x/(2^14)를 나타내도록 할 수 있습니다. 소수점 앞에 17비트, 소수점 뒤에 14비트, 부호 비트가 1개가 있기 때문에 이를 17.14 고정 소수점 숫자 표현이라고 합니다. 17.14 형식의 숫자는 최대 (2^31 - 1)/(2^14) ≈ 131,071.999의 값을 나타냅니다.

p.q 고정 소수점 형식을 사용한다고 가정하고 f = 2^q라고 가정합니다. 위의 정의에 따라 정수 또는 실수에 f를 곱하여 p.q 형식으로 변환할 수 있습니다. 예를 들어, 17.14 형식에서 위의 load_avg 계산에 사용된 분수 59/60은 (59/60)2^14 = 16,110입니다. 고정 소수점 값을 정수로 다시 변환하려면 f로 나눕니다. (C의 정규 / 연산자는 0을 향해 반올림, 즉 양수를 내림하고 음수를 올림합니다. 가장 가까운 값으로 반올림하려면 나누기 전에 양수에 f / 2를 더하거나 음수에서 빼면 됩니다.)

고정 소수점 수에 대한 많은 연산은 간단합니다. x와 y를 고정 소수점 수로 하고 n을 정수로 하자. 그러면 x와 y의 합은 x + y이고 그 차는 x - y입니다. x와 n의 합은 x + n f; 차, x - n f; 곱, x * n; 몫, x / n입니다.

두 개의 고정 소수점 값을 곱하는 데는 두 가지 복잡한 문제가 있습니다. 첫째, 결과의 소수점이 왼쪽으로 너무 멀리 떨어져 있습니다. (59/60)(59/60)은 1보다 약간 작아야 하지만 16,111 × 16,111 = 259,564,321이 2^14 = 16,384보다 훨씬 크다고 생각해보십시오. q 비트를 오른쪽으로 이동하면 259,564,321/2^14 = 15,842, 즉 약 0.97이 정답이 됩니다. 둘째, 답이 표현 가능한데도 곱셈이 넘칠 수 있습니다. 예를 들어 17.14 형식의 64는 64 × 2^14 = 1,048,576이고 그 제곱 64^2 = 4,096은 17.14 범위 내에 있지만 1,048,576^2 = 2^40으로 최대 부호 32비트 정수 값 2^31 - 1보다 큽니다. 쉬운 해결책은 곱셈을 64비트 연산으로 하는 것입니다. 그러면 x와 y의 곱은 ((int64_t) x) * y / f.

두 개의 고정 소수점 값을 나누면 정반대의 문제가 발생합니다. 소수점이 너무 오른쪽에 위치하게 되는데, 이는 나누기 전에 나누기 q 비트를 왼쪽으로 이동하여 해결합니다. 왼쪽으로 이동하면 나눗셈의 상위 q 비트가 버려지는데, 이를 다시 64비트로 나눗셈을 수행하여 해결할 수 있습니다. 따라서 x를 y로 나눌 때의 몫은 ((int64_t) x) * f / y입니다.

이 섹션에서는 두 가지 이유로 큐비트 시프트 대신 곱셈 또는 f로 나눗셈을 일관되게 사용했습니다. 첫째, 곱셈과 나눗셈에는 C 시프트 연산자처럼 놀랍게도 연산자 우선 순위가 없습니다. 둘째, 곱셈과 나눗셈은 음의 피연산자에 대해 잘 정의되어 있지만 C 시프트 연산자는 그렇지 않습니다. 구현할 때 이러한 문제에 주의하세요.

다음 표는 C에서 고정 소수점 산술 연산을 구현하는 방법을 요약한 것입니다. 표에서 x와 y는 고정 소수점 숫자, n은 정수, 고정 소수점은 부호가 있는 p.q 형식(p + q = 31), f는 1 << q입니다:

Arithmetic	C
Convert n to fixed point	`n * f`
Convert x to integer (rounding toward zero)	`x / f`
Convert x to integer (rounding to nearest)	`(x + f / 2) / f` if `x >= 0` `(x - f / 2) / f` if `x <= 0`
Add x and y	`x + y`
Subtract y from x	`x - y`
Add x and n	`x + n * f`
Subtract n from x	`x - n * f`
Multiply x by y	`((int64_t) x) * y / f`
Multiply x by n	`x * n`
Divide x by y	`((int64_t) x) * f / y`
Divide x by n	`x / n`

고정 소수점 예시

기준 값(f)으로 2^14 (16384)을 사용
소수점 아래로 14비트의 정밀도를 가지는 것과 같다.

예시: 고정 소수점 연산

1. 고정 소수점으로 변환

정수 3을 고정 소수점으로 변환하기: 3 * 16384 = 49152
실수 2.5을 고정 소수점으로 변환하기: 2.5 * 16384 = 40960

2. 고정 소수점 덧셈

1.5와 2.25를 더하기:
- 1.5를 고정 소수점으로 변환: 1.5 * 16384 = 24576
- 2.25를 고정 소수점으로 변환: 2.25 * 16384 = 36864
- 덧셈: 24576 + 36864 = 61440
- 고정 소수점 결과를 실수로 변환: 61440 / 16384 = 3.75

3. 고정 소수점 곱셈

1.5와 2.0을 곱하기:
- 1.5를 고정 소수점으로 변환: 24576
- 2.0을 고정 소수점으로 변환: 2.0 * 16384 = 32768
- 곱셈: ((int64_t) 24576) * 32768 / 16384 = 49152
- 고정 소수점 결과를 실수로 변환: 49152 / 16384 = 3.0

4. 고정 소수점 나눗셈

3.0을 2.0으로 나누기:
- 3.0을 고정 소수점으로 변환: 3.0 * 16384 = 49152
- 2.0을 고정 소수점으로 변환: 32768
- 나눗셈: ((int64_t) 49152) * 16384 / 32768 = 24576
- 고정 소수점 결과를 실수로 변환: 24576 / 16384 = 1.5

고정 소수점 연산을 사용할 때는 소수점의 위치를 고려하여 연산을 수행해야 하며, 연산의 결과로 나온 고정 소수점 숫자를 다시 실수로 변환할 때는 기준 값으로 나누어주어야 합니다. 이러한 과정을 통해 실수 연산을 정수 연산으로 변환하여 처리할 수 있습니다, 어떤 연산에서는 중간 결과가 매우 크거나 작을 수 있으므로, 오버플로우나 언더플로우를 주의해야 합니다.

Outline of Advanced Scheduler

Main Goal
- Implement 4.4 BSD scheduler MLFQ like scheduler without the queues.
- 대기열이 없는 스케줄러와 같은 4.4 BSD 스케줄러 MLFQ를 구현합니다.
  - Give priority to the processes with interactive nature.
  - 대화형 특성을 가진 프로세스에 우선순위 부여
  - Priority based scheduler
  - 기본적으로 우선순위 기반 스케줄러이다.
- Use “equation”.
- 방정식 사용하여 우선순위를 계산한다.
File to modify
- threads/thread.*
- devices/timer.c

nice

Nice value
- Represents the ‘niceness’ of a thread.
- 스레드의 'niceness'을 나타냅니다.
- If a thread is nicer, it is willing to give up some of its CPU time.
- 스레드가 nice하다는 것은 CPU 시간을 일부 포기할 의향이 있다는 것이다.
Value (from -20 to 20)
- Nice (0) : not influence on priority. (initial value)
- 좋음(0): 우선순위에 영향을 미치지 않습니다. (초기값)
- Nice (positive) : decrease priority.
- 좋음(양수): 우선순위를 낮춤.
- Nice (negative) : increase priority.
- 나쁨(음수): 우선순위를 높임.
Function
- thread_get_nice()
- thread_set_nice(int new_nice)

Priority

Priority
- From 0 (PRI_MIN) to 63 (PRI_MAX)
- Unsigned Integer 타입이다.
- The larger the number, the higher the priority.
- 숫자가 클수록 우선순위가 높습니다.
- It initialized when thread is created (default: 31)
- 스레드가 생성될 때 초기화됩니다(기본값: 31).
  priority = PRI_MAX – (recent_cpu / 4) – (nice * 2)
Philosophy
- If the thread is nicer, lower the priority.
- 스레드가 더 좋은 경우 우선순위를 낮춥니다.
- If the thread have been using lots of CPU recently, lower the priority.
- 스레드가 최근 CPU를 많이 사용했다면 우선순위를 낮춥니다.
- For all threads, priority is recalculated once in every fourth clock tick.
- 모든 스레드의 경우 4클럭마다 우선순위를 다시 계산합니다.
- The result is truncated to its nearest integer.
- 결과는 가장 가까운 정수로 잘립니다.

CPU usage

Update the cpu usage
CPU 사용량 업데이트
recent_cpu
Increase the recent_cpu of the currently running process by 1 in every timer interrupt.
매 timer interrupt마다 현재 실행중인 프로세스의 recent_cpu 값을 1씩 증가시킨다.
타이머 인터럽트는 내가 인터럽트할 때 마다 현재 실행 중인 프로세스의 recent_cpu를 1씩 증가 시킨다.

Decay
프로세스가 오래전에 cpu를 여러번 사용했음에도 방지하는 매커니즘, 1보다 작다
Decay recent_cpu by decay factor in every second.
매초마다 감쇠 계수로 recent_cpu값을 감소시킨다.
recent_cpu = decay * recent_cpu
Adjust recent_cpu by nice in every second.
매초마다 nice로 recent_cpu를 조정해라
recent_cpu = recent_cpu + nice
Putting them together,
조합하기
recent_cpu = decay * recent_cpu + nice

Decay factor

In SVR3

decay = ½
In BSD4.4
- In heavy load, decay is nearly 1.
- 부하가 많으면 감쇠는 1에 가까워진다.
- In light load, decay is 0.
- 부하가 적으면 감쇠는 0이다.
  
  decay = (2*load_average)/ (2*load_average + 1)

load _average

시스템이 얼마나 바쁜지 나타낸다.

load_avg = (59/60)*load_avg + (1/60)*ready_threads

load_avg는 load_avg와 ready_threads의 가중 평균이다.

ready_threads는 ready_list의 스레드 수와 업데이트시 실행중인 스레드의 수를 더한 것이다.

At booting, load_avg is initially set to 0.
부팅시 load_avg는 0으로 설정된다.
ready_threads: the number of threads in ready_list and threads in executing at the time of update. (except idle thread)
ready_threads: ready_list의 스레드 수와 업데이트 시 실행 중인 스레드 수입니다. (유휴 스레드 제외)

In summary,

In every fourth tick, recompute the priority of all threads
네 번째 틱마다 모든 스레드의 우선순위를 다시 계산한다.

priority = PRI_MAX – (recent_cpu / 4) – (nice * 2)

In every clock tick, increase the running thread’s recent_cpu by one.
매 클럭마다 실행중인 스레드의 recent_cpu값을 1씩 증가시킨다.

In every second, update every thread’s recent_cpu
매초마다 모든 스레드의 recent_cpu를 업데이트 한다.

recent_cpu = decay * recent_cpu + nice, where

decay = (2*load_average)/ (2*load_average + 1)

and

load_avg = (59/60)*load_avg + (1/60)*ready_threads

Implement fixed point arithmetic

Inside kernel, you can do only integer arithmetic.
커널 내부에서는 정수 연산만 할 수 있다.
- Kernel does not save floating point register when switching the context.
- 커널은 컨텍스트를 전환할 때 부동 소수점 레지스터를 저장하지 않는다.
  - We need to implement fixed point arithmetic using integer arithmetic.
우리는 정수연산을 사용해서 고정소수점 연산을 구현해야한다.
- priority, nice, ready_threads value is integer, and recent_cpu, load_avg value is real number.
- priority, nice, ready_threads 값은 정수이고, recent_cpu, load_avg는 실수이다.
- Implement the fixed-pint arithmetic using 17.14 fixed-point number representation.
- 17.14 고정 소수점 숫자 표현방법을 사용해서 고정소수점 연산을 구현해라
  - Decimal point is 14 right-most bits.
  - 소수점은 가장 오른쪽 14비트
  - Integer is 17 next bits to the left.
  - 정수는 왼쪽 17비트 다음
  - Last of left 1bit is sign bit.
  - 왼쪽 마지막 1비트는 부호비트

Operations to implement

업로드중..

Examples

Convert n to fixed point: n * f: shift n by 14 bits to the left.
n을 고정 소수점으로 변환 : n * f : n을 왼쪽을 14비트 이동
Convert x to integer: shift x by 14 to the right.
x를 정수로 변환 : x를 14비트 오른쪽으로 이동

Implement

Each thread maintains
각 스레드가 유지해야할것
- nice and recent_cpu
- Add nice, recent_cpu to thread structure.
- nice, recent_cpu를 thread 구조에 추가하기
Functions to be added
- **The function that calculate priority using recent_cpu and nice.
- recent_cpu와 nice를 사용하여 우선순위를 계산하는 함수
- The function that calculate recent_cpu.
- recent_cpu를 계산하는 함수
- The function that calculate load_avg.
- load_avg를 계산하는 함수
- The function that increase recent_cpu by 1.
- recent_cpu를 1씩 증가하는 함수
- The function that recalculate priority and recent_cpu of all threads.
- 모든 스레드의 recent_cpu와 priority를 재계산하는 함수

Multiple ready queues vs. single ready queue
다중 ready queues 와 단일 ready queue의 차이
- You can use single in implementing BSD scheduler.
- BSD 스케줄러를 구현할 때 싱글을 사용할 수 있습니다.
- If you use multiple ready queues, you will get 10 extra mark for 100 mark.
- 여러 개의 준비 대기열을 사용하는 경우 100점당 10점을 추가로 받을 수 있습니다.

Functions to modify

static void init_thread(struct thread t, const char name, int priority)
- Initialize nice, recent_cpu
- nice, recent_cpu 초기화

void thread_set_priority(int new_priority)
- Disable the priority setting when using the advanced scheduler.
- advanced scheduler를 사용할 때는 우선순위 설정을 비활성화합니다.
static void timer_interrupt(struct intr frame *args UNUSED)
- Recalculate load_avg, recent_cpu of all threads, priority every 1 sec.
- 1초마다 모든 스레드의 load_avg, recent_cpu를 다시 계산한다.
- Recalculate priority of all threads every 4th tick.
- 4번째 틱마다 모든 스레드의 우선순위를 다시 계산한다.
void lock_acquire(struct lock *lock)
- Forbid the priority donation when using the advanced scheduler.
- advanced scheduler를 사용할 때 우선 기부 기능을 금지합니다.
void lock_release(struct lock *lock)
- Forbid the priority donation when using the advanced scheduler.
- advanced scheduler를 사용할 때 우선 기부 기능을 금지합니다.
void thread_set_nice(int nice UNUSED)
- Set nice value of the current thread.
- 현재 스레드의 nice 값을 설정한다.
int thread_get_nice(void)
- Return nice value of the current thread.
- 현재 스레드의 nice 값을 반환한다.
int thread_get_load_avg(void)
- Return load_avg multiplied by 100
- load_avg에 100을 곱한 값을 반환합니다.
- timer_ticks() % TIMER_FREQ == 0
int thread_get_recent_cpu(void)
- Rerturn recent_cpu multiplied by 100
- recent_cpu에 100을 곱한값을 반환한다.

구현

우선순위를 다르게 설정하기 때문에 advanced scheduler를 사용할 때는 우선순위 설정을 비활성화 해준다.

void thread_set_priority(int new_priority)
{
	// 우선순위 설정 비활성화
	if (thread_mlfqs)
		return;
    ,,,
}

load_avg는 시스템 틱 카운터가 1초의 마다 업데이트 되어야 하므로 1초마다 load_avg를 계산할 수 있게 해준다.

static void timer_interrupt (struct intr_frame *args UNUSED) {
		// timer_ticks 는 10ms 마다 호출되므로 10ms 마다 load_avg를 업데이트
	if(timer_ticks()%TIMER_FREQ==0)
		calculate_load_avg();

실행할 준비가 된 평균 스레드 수를 구하기 위해서 load_avg를 전역 변수로 선언해준다.
int load_avg;

PintOS에서 복잡도와 실행속도로 인해 커널에서 부동 소수점 연산을 지원하지 않기 때문에 정수를 사용하여 수를 계산해야한다.

부동 소수점 연산은 매우 복잡하며 부동 소수점 연산을 위한 하드웨어인 (Floating Point Unit, FPU)의 지원이 있어야 하는데 오래된 시스템이나 임베디드 시스템에는 없는 경우가 있기때문에 고정소수점 연산을 사용하기도 한다.

분자는 F가 가능하지만 분모는 안된다

/* Sets the current thread's nice value to NICE. */
void thread_set_nice(int nice)
{
	thread_current()->nice = nice;
}

/* Returns the current thread's nice value. */
int thread_get_nice(void)
{
	
	return thread_current()->nice;
}

이재원

최고가 되기 위한 여정

다음 포스트

알람시계 구현

1개의 댓글

문성준

2024년 3월 8일

댓글을 안달수가 없네요 개추드립니다!

답글 달기

Pintos Project1-2 BSD

PintOS

Advanced Scheduler

Implement a multilevel feedback queue scheduler similar to the 4.4BSD scheduler to reduce the average response time for running jobs on your system.

Advanced Scheduler

4.4BSD scheduler와 유사한 다중 레벨 피드백 큐 스케줄러를 구현하여 시스템에서 작업을 실행하는 데 필요한 평균 응답 시간을 줄입니다.

4.4BSD Scheduler

4.4BSD 스케줄러

Niceness

Niceness

Calculating Priority

Calculating Priority

Calculating recent_cpu

Returns 100 times the current thread's recent cpu value, rounded to the nearest integer

Calculating recent_cpu

Calculating load_avg

Calculating load_avg

Summary

Summary

Fixed-Point Real Arithmetic

실수 고정 소수점 연산

고정 소수점 예시

예시: 고정 소수점 연산

1. 고정 소수점으로 변환

2. 고정 소수점 덧셈

3. 고정 소수점 곱셈

4. 고정 소수점 나눗셈

Outline of Advanced Scheduler

nice

Priority

CPU usage

Decay factor

load _average

In summary,

Implement fixed point arithmetic

Operations to implement

Examples

Implement

Functions to modify

구현

알람시계 구현

1개의 댓글

Calculating `recent_cpu`

Calculating `recent_cpu`

Calculating `load_avg`

Calculating `load_avg`