MPI Program Structure

flowchart TD
    A["#include &quot;mpi.h&quot;<br/>MPI include file"] --> B[Program begins]
    
    B --> C["Serial code<br/>────────<br/>MPI_Init<br/>MPI_Comm_size<br/>MPI_Comm_rank"]
    
    C --> D[Initialize MPI environment]
    
    D --> E["Parallel code<br/>────────<br/>MPI_Send<br/>MPI_Recv<br/>MPI_Barrier"]
    
    E --> F[Do work and message passing calls]
    
    F --> G["Serial code<br/>────────<br/>MPI_Finalize"]
    
    G --> H[Terminate MPI environment]
    
    H --> I[Program ends]
    
    style A fill:#f5f5f5
    style B fill:#fff,stroke-dasharray: 5 5
    style C fill:#f5e6d3
    style D fill:#2d5f3f,color:#fff
    style E fill:#f5e6d3
    style F fill:#2d5f3f,color:#fff
    style G fill:#f5e6d3
    style H fill:#2d5f3f,color:#fff
    style I fill:#fff,stroke-dasharray: 5 5

MPI Communicators

Concepts

  • MPI communicator 定義了一個多個 processes 可以互相溝通的通話頻道
  • 大多數的 MPI 指令都需要指定用 MPI communicator

MPI_COMM_WORLD

  • 在初始化時會自動建立的 communicator
  • 它包含了所有 MPI processes 作為成員

MPI Groups

Concept

  • Group is an ordered set of processes defined with rank ID
  • Groups can be used to create new communicators to enable new communication tunnel

Usage

// Get group from communicator
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Group world_group;
MPI_Comm_group(comm, &world_group);
 
// Create new group with specific ranks
int ranks[3] = {0, 2, 4};
MPI_Group new_group;
MPI_Group_incl(world_group, 3, ranks, &new_group);
 
// Create new communicator from group
MPI_Comm new_comm;
MPI_Comm_create(comm, new_group, &new_comm);
 
// Free groups
MPI_Group_free(&new_group);
MPI_Group_free(&world_group);

MPI Basic Routines

MPI_Init

Usage

MPI_Init(&argc, &argv);

Function

  • Must be called in every MPI program
  • Only need to call once

MPI_COMM_SIZE

Usage

MPI_Comm_size(MPI_COMM_WORLD, &size);

Function

  • Return total number of process in the specified communicator

MPI_COMM_RANK

Usage

MPI_Comm_rank(MPI_COMM_WORLD, &rank)

Function

  • Return the rank (ID) of the calling process in this communicator

MPI_Wtime

Usage

MPI_Wtime();

Function

  • Returns the runtime of the caller process

MPI_SEND

Usage

MPI_Send(&buf, count, datatype, dest, tag, communicator)
  • &buf: the address of data that we want to send
  • count: the length of the data we want to send
  • datatype: can be MPI_CHAR, MPI_SHORT, MPI_INT, MPI_FLOAT, MPI_DOUBLE, etc
  • dest: the destination process rank ID
  • tag: a non negative integer to identify the message (used when dest receiving message)

Function

  • Send the message to the destination
  • blocking

MPI_RECV

Usage

MPI_Recv(&buf, count, datatype, source, tag, communicator, status);
  • source: the source rank ID
  • status: record some information (e.g., source, error) about this transition

Function

  • Receive message from source
  • blocking

MPI_Isend

  • Asynchronous version of MPI_Send

MPI_Irecv

  • Asynchronous version of MPI_Recv

MPI_Barrier

Usage

MPI_Barrier(MPI_COMM_WORLD);

Function

  • Synchronize process in the communicator before continue executing

MPI_Finalize

Usage

MPI_Finalize();

Function

  • Terminates the MPI program
  • End all MPI communicators

MPI Collective Routine

MPI Broadcast

Usage

MPI_Bcast (void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm);
  • root: rank of the broadcast root

Function

  • Broadcast to all processes within the communicator specified

MPI Reduce

Usage

int MPI_Reduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int
root, MPI_Comm comm);
  • sendbuf: the buffer for every process to provide information
  • recvbuf: only relevant to root process, it’ll store the final result for MPI_Reduce
  • op: the operation to perform to get single value
MPI_OPType
MPI_MAXReturns the maximum element
MPI_MINReturns the minimum element
MPI_SUMSums the elements
MPI_PRODMultiplies all elements
MPI_LANDPerforms a logical AND across the elements
MPI_LORPerforms a logical OR across the elements
MPI_BANDPerforms bitwise AND across the elements
MPI_BORPerforms bitwise OR across the elements

Function

  • Reduces values on all processes into a single value

MPI Allreduce

Usage

int MPI_Allreduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm);

Function

  • MPI_Reduce but will broadcast the result back to all processes

MPI Scatter

Usage

int MPI_Scatter (void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm);
  • sendcount: the number of element send to each process
  • recvcount: the number of element receive by each process

Function

  • Do collective communication, we can scatter data from root process to all other processes
  • Every process receive same of amount of data
  • Every process needs to call this function to work
Root process 的 sendbuf 總共需要:4 processes × 2 elements = 8 elements
                                    (size × sendcount)

sendbuf: [0, 1, 2, 3, 4, 5, 6, 7]
          └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘
            ↓     ↓     ↓     ↓
         P0 收   P1 收  P2 收  P3 收
         2個     2個    2個    2個

MPI Gather

Usage

int MPI_Gather (void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

Function

  • Opposite of MPI_Scatter, it does the reverse thing scatter do

Why we use MPI_Scatter and MPI_Gather

It is faster than MPI_Send/MPI_Recv

MPI Allgather

Usage

int MPI_Allgather (void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm);

Function

  • Allgather 之於 Gahter 就像Allreduce 之於 Reduce