Datapath Components
Introduction
A system contains three main kinds of components: control, execution, and storage
Five Stage Pipeline

1. Storage Components
Memory
儲存正在執行的 process 和 data
Register
Register 在 CPU 內部,他能非常快的和 CPU 進行交互,但是容量很小
當一個 processor 要跑一個 instruction 時,他所需要用到的所有數據都應該要同步到 register 中才能開始執行
Cache
Memory 和 Register 之間的緩衝,Memory 常常被取用的資料就會在這裡有一份備份,讓之後 Register 需要同步時可以快速從 Cahce 中取出而不用等 Memory
2. Execution Component
Basic Components
- Arithmetic Components: adders, multipliers, subtractors, …
- Logical Operation Components: AND, OR, NOT
- Shifters, Comparators
ALU
一個 ALU 中包含了多個 Basic Components,會根據 OP code 選擇適合的 component 做運算
3. Control Component
Decoder
我們會從 Instruction Memory 傳送下一個 instruction 給 decoder,然後 decoder 將其轉換為 OP code (由 ISA 規範) 及 register selection
- OP code 會告訴 ALU 他該執行 ISA 規範的哪個指令
- Register selection 則會告訴 register 我想要讀取或寫入哪個 register
Program Counter (PC)
告訴 instruction storage 下個執行的 instruction 存在哪個 address
Parallel Execution
Multiple Processing Cores (Multiple Processors)
可以同時跑不同指令在多個 processors 上以達到平行的目的
Multiple ALUs within a core
Single Instruction, Multiple Data (SIMD)
因為一個 core 中只有一個 decoder,所以同個 processor 的多個 ALU 只能做同個 instruction,像是如果我們要對一整個陣列做同個 instruction,那麼我們就可以用到多個 ALU 帶來的 parallelism
Problem
我們很多的 single stream 程式碼無法享受到這個 speedup
OpenMP
Header File
#include <omp.h>
Define Parallel Region
#pragma omp parallel
{
the region to run in parallel
}
Directives
parallel
Define a parallel region, which the code will be executed by multiple threads in parallel
for
Split the work in a for loop in the parallel region to be divided among threads
atomic
Specify the memory to be updated atomically
simd
Tell the compiler to use multiple ALU to compute simple, repetitive operations on array
Clause
num_threads
Set the number of thread to be used
private
Specify the variable that each thread should have its own instance of a variable
reduction
- When code is executing in parallel, each thread will have its own copy of variable
- After everything is done, the answers will then be combined to shared variable
shared
Specify variables that should be shared among all threads
schedule
static, dynamic guided