Thread-Level Parallelism and SIMD Parallelism on OpenMP

Thread-Level Parallelism

  • Utilize multiple CPU cores
  • Suitable for task-level parallelism (every thread can run different instruction)

SIMD Parallelism

  • Utilize SIMD vector units within a CPU core
  • Ideal for data-level parallelism (can only do the same instruction)
  • Best for simple, repetitive operations on arrays

Conditional Execution

Conditional Execution

D-PP-Lec03a-Conditional-Execution

Terminology

Coherent Execution

  • Same instructions are applied to all elements
  • Necessary for SIMD

Divergent Execution

  • have if else or different process might execute different part of the code
  • Not efficient use of SIMD or SIMT

Efficient Execution

Memory Constraints

D-PP-Lec03b-Memory-Constraints

Hiding Memory Latency

D-PP-Lec03c-Prefetching D-PP-Lec03d-Multi-Threading