Digital Design and Computer Architecture - L1 : Introduction and Basics

namu·2024년 5월 15일
0
post-thumbnail

Based on such understanding:
learn how a modern computer works underneath
evaluate tradeoffs of different designs and ideas
implement a principled design (a simple microprocessor)
learn to systematically debug increasingly complex systems
hopefully enable you to develop novel, out-of-the-box designs

The focus is on basics, principles, precedents, and how to use them to create/implement good designs

Because you are here for a computer sceience degree.

Regardless of your future direction, learning the principles of digital design & computer architecture will be useful to
design better hardware
design better software
design better systems
make better tradeoffs in design
understand why computers behave the way they do
solve problems better
think "in parallel"
think critically
...

The Transformation Hierarchy
Problem
Algorithm
Program/Language
System Software
SW/HW Interface
Micro-architecture
Logic
Devices
Electrons

Problem

Algorithm

Program/Language

System Software
(VM, OS, MM): virtual machine, os, memory manager

ISA(Architecture)
ISA(Instruction Set Architecture)
Interface/contract between SW and HW.
What the programmer assumes hardware will satisfy.

Microarchitecture
An implementation of the ISA

Logic(디지털 논리 회로)
Microarchitecture 를 구현

Devices
physical devices.
logic gates 를 구현
전자의 원리에 기반하여 동작함.

Electrons

Computer Architecture

  • is the science and art of designing computing platforms(hardware, interface, system SW, and programming model)
  • to achieve a set of design goals
    -- E.g., highest performance on earth on workloads X, Y, Z
    -- E.g., longest battery life at a form factor that fits in your pocket with cost < $$$ CHF
    -- E.g., best average performance across all known workloads at the best performance/cost ratio
    -- ...
    -- Designing a supercomputer is different from designing a smartphone -> But, many fundamental principles are similar

Different Platforms, Different Goals

TPU Printed Circuit Board. It can be inserted in the slot for an SATA disk in a server, but the card uses PCIe Gen3 x16.

Systolic data flow of the Matrix Multiply Unit. Software has the illusion that each 256B input is read at once, and they instantly update once location of each of 256 accumulator RAMs.

TPU
New ML applications (vs. TPU3)

  • Computer vision
  • Natural Language Processing(NLP)
  • Recommender system
  • Reinforcement learning that plays Go

Tesla self-driving computer

  • ML accelerator: 260 mm2, 6 billion transistors, 600 GFLOPS GPU, 12 ARM 2.2 GHz CPUs.
  • Two redundant chips for better safety.

NVIDIA is claiming a 7x improvement in dynamic programming algorithm (DPX instructions) performance on a single H100 versus naive execution on an A100.

To achieve the highest energy efficiency and performance:
we must take the expanded view
of computer architecture

What is Computer Architecture?

  • The science and art of designing, selecting, and interconnecting hardware components and designing the hardware/software interface to create a computing system that meets functional, performance, energy consumption, cost, and other specific goals.

Why Study Computer Architecture?

  • Enable better systems: make computers faster, cheaper, smaller, more reliable,...
    -- By exploiting advances and changes in underlying technology/circuits
  • Enable new applications
    -- Life-like 3D visualization 20 years ago? Virtual reality?
    -- Self-driving cars?
    -- Personalized genomics? Personalized medicine?
  • Enable better solutions to problems
    -- Software innovation is built on trends and changes in computer architecture
    --- > 50% performance improvement per year has enabled this innovation
  • Understand why computers work the way they do

Industry is in a large paradigm shift (to novel architectures)
many different potential system designs possible

Many difficult problems motivating and caused by the shift

  • Huge hunger for data and new data-intensive applications
  • Power/energy/thermal constraints
  • Complexity of design
  • Difficulties in technology scaling
  • Memory bottleneck
  • Reliability problems
  • Programmability problems
  • Security and privacy issues

Computing landscape is very different from 10-20 years ago
Applications and technology both demand novel architectures

Every component and its interfaces, as well as entire system designs are being re-examined

Performance
Energy Efficiency
Sustainability

A memory access consumes 6400X
the energy of a simple integer addition

Computing Architectures with Minimal Data Movement

UPMEM Processing-in-DRAM Engine

  • Processing in DRAM Engine
  • Includes standard DIMM modules, with a large number of DPU processors combined with DRAM chips.
  • Replaces standard DIMMs
    -- DDR4 R-DIMM modules
    --- 8GB+128 DPUs (16 PIM chips)
    --- Standard 2x-nm DRAM process
    -- Large amounts of compute & memory bandwidth

FPGA 는 하드웨어를 다시 설계할 수 있다.

Samsung Function-in-Memory DRAM (2021)
Programmable Computing Unit

  • Configuration of PCU block
    -- Interface unit to control data flow
    -- Execution unit to perform operations
    -- Register group
    --- 32 entries of CRF for instruction memory
    --- 16 GRF for weight and accumulation
    --- 16 SRF to store constants for MAC operations

A combination of GDDR6-AiM with CPU or GPU instead of a typical DRAM makes certain computation speed 16 times faster. GDDR6-AiM is widely expected to be adopted for machine learning, high-performance computing, and big data computation and storage.

Specialized Processing in Memory (2015)
A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

Simple Processing in Memory (2015)
PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture

another example it a device technology
Intel Optane Persistent Memory (2019)

  • Non-volatile main memory
  • Based on 3D-XPoint Technology
    메모리가 비휘방성이라 전원 꺼도 살아있다.
    이는 디스크에서 올릴 필요가 없는 경우, 성능 향상이 있겠지.
    근데 중단되었다고 함.

Emerging Memories Also Need Intelligent Controllers
Architecting Phase Change Memory as a Scalable DRAM Alternative

Cerebras's Wafer Scale ML Engine (2019)
The largest ML accelerator chip
400,000 cores
2021 꺼는 850,000 cores

Fundamentally High-Performance
Energe efficiency
Data-Centric
Computing Arcitectures

Data-Centric Architecture 는
메모리와 계산이 더욱 커플된 것.

Google's Video Coding Unit (2021)
Warehouse-Scale Video Acceleration: Co-design and Deployment in the Wild

security is about preventing unforeseen consequences

The Story of RowHammer

  • One can predictably induce bit flips in commodity DRAM chips
    -- All tested DRAM chips are vulnerable
  • First example of how a simple hardware failure mechanism can create a widespread system security vulnerability

Modern DRAM is Prone to Disturbance Errors

Repeatedly reading a row enough times (before memory gets refreshed) induces disturbance errors in adjacent rows in most real DRAM chips you can buy today

One Can Take Over an Otherwise-Secure System
Flipping Bits in Memory Without Accessing Them:
An Experimental Study of DRAM Disturbance Errors

profile
안녕하세요

0개의 댓글