MPI Program Structure
flowchart TD A["#include "mpi.h"<br/>MPI include file"] --> B[Program begins] B --> C["Serial code<br/>────────<br/>MPI_Init<br/>MPI_Comm_size<br/>MPI_Comm_rank"] C --> D[Initialize MPI environment] D --> E["Parallel code<br/>────────<br/>MPI_Send<br/>MPI_Recv<br/>MPI_Barrier"] E --> F[Do work and message passing calls] F --> G["Serial code<br/>────────<br/>MPI_Finalize"] G --> H[Terminate MPI environment] H --> I[Program ends] style A fill:#f5f5f5 style B fill:#fff,stroke-dasharray: 5 5 style C fill:#f5e6d3 style D fill:#2d5f3f,color:#fff style E fill:#f5e6d3 style F fill:#2d5f3f,color:#fff style G fill:#f5e6d3 style H fill:#2d5f3f,color:#fff style I fill:#fff,stroke-dasharray: 5 5
MPI Communicators
Concepts
- MPI communicator 定義了一個多個 processes 可以互相溝通的通話頻道
- 大多數的 MPI 指令都需要指定用 MPI communicator
MPI_COMM_WORLD
- 在初始化時會自動建立的 communicator
- 它包含了所有 MPI processes 作為成員
MPI Groups
Concept
- Group is an ordered set of processes defined with rank ID
- Groups can be used to create new communicators to enable new communication tunnel
Usage
// Get group from communicator
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Group world_group;
MPI_Comm_group(comm, &world_group);
// Create new group with specific ranks
int ranks[3] = {0, 2, 4};
MPI_Group new_group;
MPI_Group_incl(world_group, 3, ranks, &new_group);
// Create new communicator from group
MPI_Comm new_comm;
MPI_Comm_create(comm, new_group, &new_comm);
// Free groups
MPI_Group_free(&new_group);
MPI_Group_free(&world_group);MPI Basic Routines
MPI_Init
Usage
MPI_Init(&argc, &argv);Function
- Must be called in every MPI program
- Only need to call once
MPI_COMM_SIZE
Usage
MPI_Comm_size(MPI_COMM_WORLD, &size);Function
- Return total number of process in the specified communicator
MPI_COMM_RANK
Usage
MPI_Comm_rank(MPI_COMM_WORLD, &rank)Function
- Return the rank (ID) of the calling process in this communicator
MPI_Wtime
Usage
MPI_Wtime();Function
- Returns the runtime of the caller process
MPI_SEND
Usage
MPI_Send(&buf, count, datatype, dest, tag, communicator)&buf: the address of data that we want to sendcount: the length of the data we want to senddatatype: can be MPI_CHAR, MPI_SHORT, MPI_INT, MPI_FLOAT, MPI_DOUBLE, etcdest: the destination process rank IDtag: a non negative integer to identify the message (used when dest receiving message)
Function
- Send the message to the destination
- blocking
MPI_RECV
Usage
MPI_Recv(&buf, count, datatype, source, tag, communicator, status);source: the source rank IDstatus: record some information (e.g., source, error) about this transition
Function
- Receive message from source
- blocking
MPI_Isend
- Asynchronous version of
MPI_Send
MPI_Irecv
- Asynchronous version of
MPI_Recv
MPI_Barrier
Usage
MPI_Barrier(MPI_COMM_WORLD);Function
- Synchronize process in the communicator before continue executing
MPI_Finalize
Usage
MPI_Finalize();Function
- Terminates the MPI program
- End all MPI communicators
MPI Collective Routine
MPI Broadcast
Usage
MPI_Bcast (void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm);root: rank of the broadcast root
Function
- Broadcast to all processes within the communicator specified
MPI Reduce
Usage
int MPI_Reduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int
root, MPI_Comm comm);sendbuf: the buffer for every process to provide informationrecvbuf: only relevant to root process, it’ll store the final result forMPI_Reduceop: the operation to perform to get single value
| MPI_OP | Type |
|---|---|
| MPI_MAX | Returns the maximum element |
| MPI_MIN | Returns the minimum element |
| MPI_SUM | Sums the elements |
| MPI_PROD | Multiplies all elements |
| MPI_LAND | Performs a logical AND across the elements |
| MPI_LOR | Performs a logical OR across the elements |
| MPI_BAND | Performs bitwise AND across the elements |
| MPI_BOR | Performs bitwise OR across the elements |
Function
- Reduces values on all processes into a single value
MPI Allreduce
Usage
int MPI_Allreduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm);Function
MPI_Reducebut will broadcast the result back to all processes
MPI Scatter
Usage
int MPI_Scatter (void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm);sendcount: the number of element send to each processrecvcount: the number of element receive by each process
Function
- Do collective communication, we can scatter data from root process to all other processes
- Every process receive same of amount of data
- Every process needs to call this function to work
Root process 的 sendbuf 總共需要:4 processes × 2 elements = 8 elements
(size × sendcount)
sendbuf: [0, 1, 2, 3, 4, 5, 6, 7]
└─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘
↓ ↓ ↓ ↓
P0 收 P1 收 P2 收 P3 收
2個 2個 2個 2個
MPI Gather
Usage
int MPI_Gather (void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)Function
- Opposite of
MPI_Scatter, it does the reverse thing scatter do
Why we use
MPI_ScatterandMPI_GatherIt is faster than
MPI_Send/MPI_Recv
MPI Allgather
Usage
int MPI_Allgather (void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm);Function
Allgather之於Gahter就像Allreduce之於Reduce