Datapath Components

Introduction

A system contains three main kinds of components: control, execution, and storage

Five Stage Pipeline

1. Storage Components

Memory

儲存正在執行的 process 和 data

Register

Register 在 CPU 內部,他能非常快的和 CPU 進行交互,但是容量很小

當一個 processor 要跑一個 instruction 時,他所需要用到的所有數據都應該要同步到 register 中才能開始執行

Cache

Memory 和 Register 之間的緩衝,Memory 常常被取用的資料就會在這裡有一份備份,讓之後 Register 需要同步時可以快速從 Cahce 中取出而不用等 Memory

2. Execution Component

Basic Components

  • Arithmetic Components: adders, multipliers, subtractors, …
  • Logical Operation Components: AND, OR, NOT
  • Shifters, Comparators

ALU

一個 ALU 中包含了多個 Basic Components,會根據 OP code 選擇適合的 component 做運算

3. Control Component

Decoder

我們會從 Instruction Memory 傳送下一個 instruction 給 decoder,然後 decoder 將其轉換為 OP code (由 ISA 規範) 及 register selection

  • OP code 會告訴 ALU 他該執行 ISA 規範的哪個指令
  • Register selection 則會告訴 register 我想要讀取或寫入哪個 register

Program Counter (PC)

告訴 instruction storage 下個執行的 instruction 存在哪個 address


Parallel Execution

Multiple Processing Cores (Multiple Processors)

可以同時跑不同指令在多個 processors 上以達到平行的目的

Multiple ALUs within a core

Single Instruction, Multiple Data (SIMD)

因為一個 core 中只有一個 decoder,所以同個 processor 的多個 ALU 只能做同個 instruction,像是如果我們要對一整個陣列做同個 instruction,那麼我們就可以用到多個 ALU 帶來的 parallelism

Problem

我們很多的 single stream 程式碼無法享受到這個 speedup


OpenMP

Header File

#include <omp.h>

Define Parallel Region

#pragma omp parallel
{
	the region to run in parallel
}

Directives

parallel

Define a parallel region, which the code will be executed by multiple threads in parallel

for

Split the work in a for loop in the parallel region to be divided among threads

atomic

Specify the memory to be updated atomically

simd

Tell the compiler to use multiple ALU to compute simple, repetitive operations on array

Clause

num_threads

Set the number of thread to be used

private

Specify the variable that each thread should have its own instance of a variable

reduction

  1. When code is executing in parallel, each thread will have its own copy of variable
  2. After everything is done, the answers will then be combined to shared variable

shared

Specify variables that should be shared among all threads

schedule

static, dynamic guided