11. Branch Prediction 1,2

이세진·2022년 4월 3일
1

Computer Science

목록 보기
55/74

생성일: 2021년 10월 16일 오후 9:55

Control Hazards

  • Improves strategy for placing instructions in delay slot
  • Two strategies
    • Backward branch predict taken, forward branch not taken
    • Profile-based prediction: record branch behavior, predict branch based on prior run

Hazards를 제외하고 Pipelining을 어렵게 만드는 요소

  • Complex Addressing Modes and Instructions
  • Address modes: Autoincrement causes register change during instruction execution
  • Memory-Memory Move Instructions
  • Condition Codes

Handling Multi-cycle Operations

  • Multi-cycle instructions also lead to pipeline complexity.
  • A very lengthy instruction causes everything else in the pipeline to wait for it.
  • e.g. Floating Point
    • Adds WAR and WAW hazards since pipelines are no longer same length

How to Handle Control Dependences

  • Critical to keep the pipeline full with correct sequence of dynamic instructions
  • Potential Solutions
    • Stall the pipeline until we know the next fetch address
    • Backward Taken Forward Not Taken
    • Guess the next fetch address (branch prediction)
    • Employ delayed branching (branch delay slot)
    • Do something else (fine-grained multithreading)
    • Eliminate control-flow instructions (predicated execution)
    • Fetch from both possible paths (if you know the addresses of both possible paths)
      (multipath execution)

Branch Problem ⭐

Problem : Next fetch address after a control-flow instruction is not determined after N cycles in a pipelined processor

  • If we are fetching W instructions per cycle (i.e., if the pipeline is W wide)
    • A branch mis-prediction leads to N x W wasted instruction slots
  • Assume N = 20 (20 pipe stages), W = 5 (5 wide fetch)
  • Assume: 1 out of 5 instructions is a branch
  • Assume: Each 5 instruction-block ends with a branch
  • Assume: Branch direction is determined at the final pipe stage
  • How long does it take to fetch 500 instructions?
  • 100% accuracy
    • 100 cycles (all instructions fetched on the correct path)
    • No wasted work
  • 99% accuracy
    • 100 (correct path) + 20 (wrong path) = 120 cycles
    • 20% extra instructions fetched
  • 98% accuracy
    • 100 (correct path) + 20 * 2 (wrong path) = 140 cycles
    • 40% extra instructions fetched
  • 95% accuracy
    • 100 (correct path) + 20 * 5 (wrong path) = 200 cycles
    • 100% extra instructions fetched

Branch Prediction

  • Predict the next fetch address
  • Requires three things to be predicted at fetch stage
    1. Whether the fetched instruction is a branch
    2. (Conditional) branch direction
    3. Branch target address (if taken)
  • Observation: Target address remains the same for a conditional direct branch across dynamic instances
    • Idea: Store the target address from previous instance and access it with the PC
    • Called Branch Target Buffer (BTB) or Branch Target Address Cache


위의 3가지 조건중 1,3번은 BTB를 이용하여 수행 가능

그렇다면 2번인 branch direction 은?

Direction Prediction ‼️

  • Compile time (static)
    • Always not taken
      • low accuracy : ~30-40%
    • Always taken
      • Better accuracy : ~60-70%
    • BTFN (Backward taken, Forward not taken)
    • Profile based
      • requires hint bit in the branch instruction format

    • Program analysis based
    • Programmer-based

💡
1. 위의 기술들을 합칠 수 있다.
2. 공통된 단점 : Cannot adapt to dynamic changes in branch behavior

  • Run time (dynamic)
    • Last time prediction
    • Two-bit counter based prediction
    • Two-level prediction
    • Hybrid
    • Advanced algorithms

💡
1. 장점
+ Prediction based on history of the execution of branches
+ It can adapt to dynamic changes in branch behavior
+ No need for static profiling: input set representativeness problem goes away
2. 단점
- More complex (requires additional hardware)

Last Time Predictor

  • Single bit per branch (stored in BTB)
  • Indicates which direction branch went last time it executed

  • Always mispredicts the last iteration and the first iteration of a loop branch
    • Accuracy for a loop with N iterations = (N-2)/N
  • Loop branches for loops with large N (number of iterations) 일 때 유리
  • Loop branches for loops will small N (number of iterations) 일 때 불리

  • Last time predictor 는 그동안 계속 T가 나왔더라도 한번이라도 N이 나오면 다음 예측을 곧바로 N으로 바꾼다는 단점이 있다.
    • Hysteresis 를 predictor에 추가하여 하나의 다른 결과만으로는 예측을 바꾸지 않게 향상 시킴
      ⇒ Two-bit Counter based prediction

Two-bit Counter based prediction

  • A strong prediction does not change with one single different outcome
  • Accuracy for a loop with N iterations = (N-1)/N

  • ~85-90% accuracy for many programs with 2-bit counter based prediction (also called bimodal prediction)

  • Assume N = 20 (20 pipe stages), W = 5 (5 wide fetch)

  • Assume: 1 out of 5 instructions is a branch

  • Assume: Each 5 instruction-block ends with a branch

  • Assume: Branch direction is determined at the final pipe stage.

  • How long does it take to fetch 500 instructions?

  • 100% accuracy

    • 100 cycles (all instructions fetched on the correct path)
    • No wasted work
  • 95% accuracy

    • 100 (correct path) + 20 * 5 (wrong path) = 200 cycles
    • 100% extra instructions fetched
  • 90% accuracy

    • 100 (correct path) + 20 * 10 (wrong path) = 300 cycles
    • 200% extra instructions fetched
  • 85% accuracy

    • 100 (correct path) + 20 * 15 (wrong path) = 400 cycles
    • 300% extra instructions fetched

Two-Level Prediction

  • Last-time and 2BC predictors exploit “last-time” predictability
  • Realization 1: A branch’s outcome can be correlated with other branches’ outcomes
  • Realization 2: A branch’s outcome can be correlated with past outcomes of the same branch (other than the outcome of the branch“last-time” it was executed)

위와 같은 상황에서 첫번째 branch가 not taken이라면 당연히 두번째도 not taken이다.

  • Idea: Associate branch outcomes with “global T/NT history” of all branches
  • Make a prediction based on the outcome of the branch the last time the same global branch history was encountered

⇒Global history/branch predictor

  • First level: Global branch history register (N bits) (A set of local history registers)
  • Second level: Table of saturating(포화) counters for each history entry

  • An Issue: Interference in the PHTs
    • Sharing the PHTs between histories/branches leads to interference
  • Reducing Interference in PHTs (I)
    1. Increase size of PHT
    2. Branch filtering
    3. Hashing/index-reandomization

Two level local History Branch Pridictor 그림

profile
나중은 결코 오지 않는다.

0개의 댓글