13. SuperScalar

이세진·2022년 4월 3일
0

Computer Science

목록 보기
57/74

생성일: 2021년 11월 11일 오후 3:36

Extracting More Performance

Two options

  1. Increase the depth of the pipeline to increase the clock rate - superpipeline
  2. Fetch (and execute) more than one instructions at one time - superscalar

⇒ Launching multiple instructions per stage allows the instruction execution rate, CPI, to be less than 1

Superpipelining

pipeline stage를 더 작게 쪼갬

  • Increase the depth of the pipeline leading to shorter clock cycles
  • Higher degree of superpipelining induces
    • more forwarding/hazard hardware needed
    • even more inevitable pipeline stalls (more data dependency)
    • gain of superpipelining is not linear

Superpipelined vs Superscalar(SS)

  • Superpipelined processors have longer instruction latency than the SS processors which can degrade performance in the presence of true dependencies
  • Superscalar processors are more susceptible(취약) to resource conflicts – but we can fix this with extra hardware

Multiple-Issue Processor Styles

  • Static multiple-issue processors (aka VLIW - very long instruction word architecture)
    • Decisions on which instructions to execute simultaneously are being made statically (at compile time by the compiler)
  • Dynamic multiple-issue processors (aka superscalar)
    • Decisions on which instructions to execute simultaneously are being made dynamically (at run time by the hardware)

VLIW (Very Long Instruction Word)

  • Software (compiler) packs independent instructions in a larger “instruction bundle” to be fetched and executed concurrently
  • Hardware fetches and executes the instructions in the bundle concurrently
  • No need for hardware dependency checking between concurrently fetched instructions in the VLIW model

  • Traditional Characteristics
    • Multiple functional units
    • All instructions in a bundle are executed in lock step
    • Instructions in a bundle statically aligned to be directly fed into the functional units
  • Example
  • 장점
    • No need for dynamic scheduling hardware → simple hardware
    • No need for dependency checking within a VLIW instruction → simple hardware for multiple instruction issue + no renaming
    • No need for instruction alignment/distribution after fetch to different functional units → simple hardware
  • 단점
    • Compiler needs to find N independent operations per cycle
    • Recompilation required when execution width (N), instruction latencies, functional units change (Unlike superscalar processing)
    • Lockstep execution causes independent operations to stall
  • 정리
    • 하드웨어 간소화 가능, 그러나 복잡한 컴파일러 기술
    • 성능 저하 가능성
    • 컴파일러가 병렬 처리를 더 쉽게 찾을 때 VLIW 성공적

Superscalar

  • Fetch, decode, execute, retire multiple instructions per cycle
    • N-wide superscalar ⇒ N instructions per cycle
    • Add hardware resources
    • Hardware performs the dependence checking between concurrentlyfetched instructions
  • Multiple-Issue Datapath (or SS) Responsibilities
    • Data dependencies – data hazards

    • Procedural dependencies – control hazards

      • Use dynamic branch prediction
    • Resource conflicts – structural hazards
      - Duplicating the resource or by pipelining the resource

      위의 3가지 문제가 Superscalar에서 더 악화됨

Out of Order (OoO) Execution

  • Instruction-issue ⇒ initiate execution
    • Instruction lookahead capability – fetch, decode and issue instructions beyond the current instruction
  • Instruction-completion ⇒ complete execution
    • Processor lookahead capability – complete issued instructions beyond the current instruction
  • Instruction-commit ⇒write back results to the RegFile or D$ (i.e., change the machine state)


1. In-Order Issue with In-Order Completion
- Simplest policy is to issue instructions in exact program order and to complete them in the same order they were fetched (i.e., in program order)

  1. In-Order Issue with Out-of-Order Completion

    • With out-of-order completion, a later instruction may complete before a previous instruction
    • When using out-of-order completion, instruction is stalled when there is a resource conflict (e.g., for a functional unit) or when the instructions ready to issue need a result that has not yet been computed

    • Handling Output Dependencies
      • IOI-OOC 에서

        이것을 실행 시킬 때, I1이 I2보다 늦게 실행 될 수 도 있다. ⇒ I5에서 절못된 값이 들어 있는 R3를 읽게됨 ⇒ I2 has anouput dependency on I1 ⇒ write before write

        따라서 stall 되야 할 수 있다. ⇒ it requires more dependency checking hardware (read before write , write before write 일 때 필요)

  1. Out-of-Order Issue with Out-of-Order Completion

    • With in-order issue the processor stops decoding instructions whenever a decoded instruction has a resource conflict or a data dependency on an issued, but uncompleted instruction
    • 충돌한 것 넘어의 instruction을 가져와 decode, buffer에 저장, 버퍼에 리소스 충돌이나 데이터 종속성이 없는 명령에 flag 지정
    • flagged instructions은 프로그램 순서에 관계없이 버퍼에서 issued
    • issue means inserting instructions into Execution Unit

Antidependencies (write before read)

  • Data dependency의 일종
  • With OOI also have to deal with data antidependencies

Storage Conflicts and Register Renaming

  • Storage conflicts can be reduced (or eliminated) by increasing or duplicating the troublesome resource
  • Register renaming
    • the processor renames the original register identifier in the instruction to a new register (one not in the visible register set)

Superscalar 장단점

  • 장점
    • Higher IPC (instructions per cycle)
  • 단점
    • Higher complexity for dependency checking
      • Require checking within a pipeline stage
      • Renaming becomes more complex in an OoO processor
    • More hardware resources needed

Superscalar 정리

  • Must handle, with a combination of hardware and software fixes, the fundamental limitations of
    • Data dependencies – aka data hazards

      • OOI-OOC 에서 악화됨
    • Procedural dependencies – aka control hazards

      • Use dynamic branch prediction
    • Resource conflicts – aka structural hazards
      - Resource conflicts can be eliminated by duplicating the resource or by pipelining the resource

      3가지 전부 superscalar에서 악화됨

profile
나중은 결코 오지 않는다.

0개의 댓글