Class MPISystem

Class Documentation

class MPISystem

This class encapsulates the access to the different communicators being used.

Each process belongs to different MPI communicators which can overlap. If a process does not belong to a specific communicator the corresponding get function returns MPI_COMM_NULL.

worldComm_: contains all processes. usually equal to MPI_COMM_WORLD. if fault tolerance is enabled the processes of groups detected as failed will be removed

globalComm_: contains the manager and the master process of each process group

spareCommFT_: contains inactive but reusable ranks. Only used for fault-tolerance simulations.

localComm_: contains the processes of the group a process is assigned to. this is MPI_COMM_NULL for the manager process. this is the communicator the Task uses for the computations.

globalReduceComm_: contains the processes for the global reduction. in the case of equally sized process groups (at the moment this is the only option) this communicator contains all processes which have the same rank in localComm_. it is MPI_COMM_NULL on the manager.

thirdLevelComms_: list of communicators for the thirdLevelCombination. Each communicator connects all workers of a pg to the process manager. The list differs for each caller and contains only the comms where he participates: The process manager gets all comms whereas each worker has a single entry.

getXXXCommFT returns the fault-tolerant equivalent of Communicator XXX

Public Functions

void init(size_t ngroups, size_t nprocs, bool withWorldManager = true)

(re-)initializes MPI system including local, global and global reduce communicator

Parameters:
  • ngroups – number of process groups

  • nprocs – number of MPI processes per process group

  • withWorldManager – true for manager-worker setups, false for worker-only setups

void init(size_t ngroups, size_t nprocs, CommunicatorType lcomm, bool withWorldManager = true)

(re-)initializes MPI system including global and global reduce communicator

local communicator is given here and not set by the init procedure

void initWorldReusable(CommunicatorType wcomm, size_t ngroups, size_t nprocs, bool withWorldManager = true, bool verbose = false)

(re)initializes MPI system including world communicator, local, global and global reduce communicators

Parameters:
  • wcomm – world communicator

  • ngroups – number of process groups

  • nprocs – number of MPI processes per process group

  • withWorldManager – true for manager-worker setups, false for worker-only setups

inline const CommunicatorType &getWorldComm() const

returns the world communicator which contains all activeranks (excluding spare ranks)

inline const CommunicatorType &getGlobalComm() const

returns the global communicator which contains all manager and group master ranks

inline const CommunicatorType &getLocalComm() const

returns the local communicator which contains all ranks within the process group of caller

inline RankType getProcessGroupNumber() const

get own process group number

inline const CommunicatorType &getGlobalReduceComm() const

returns the global reduce communicator which contains all ranks with wich the rank needs to communicate in global allreduce step

All of these ranks are responsible for the same area in the domain, and have the same rank in their respective process groups / local communicators

inline const CommunicatorType &getOutputGroupComm() const

returns the (diagonally assigned) communicator for the file-based widely-distributed output

returs MPI_COMM_NULL if initOutputGroupComm was not called

inline const CommunicatorType &getOutputComm() const

returns a sub-communicator of the OutputGroupComm, if partitioned file output is used

returs MPI_COMM_NULL if initOutputGroupComm was not called with numFileParts > 1 or was not called at all

inline RankType getFilePartNumber() const

returns the output file partition number of the calling rank

should only be called from ranks in the output group

inline const std::vector<CommunicatorType> &getThirdLevelComms() const
inline simft::Sim_FT_MPI_Comm getWorldCommFT()

returns the fault tolerant version of the world comm (excluding spare ranks)

inline simft::Sim_FT_MPI_Comm getSpareCommFT()

returns the communicator containing spare processors (or ranks)

inline simft::Sim_FT_MPI_Comm getGlobalCommFT()

returns the fault tolerant version of the global comm

inline simft::Sim_FT_MPI_Comm getLocalCommFT()

returns the fault tolerant version of the local comm

inline simft::Sim_FT_MPI_Comm getGlobalReduceCommFT()

returns the fault tolerant version of the allreduce comm

inline const RankType &getWorldRank() const

returns MPI rank number in world comm

inline RankType getWorldSize() const

get the size of the world communicator

inline const RankType &getGlobalRank() const

returns MPI rank number in global comm

inline const RankType &getLocalRank() const

returns MPI rank number in local comm

inline const RankType &getGlobalReduceRank() const

returns MPI rank number in global reduce comm

should be the same as the process group number of the calling rank

inline const RankType &getOutputGroupRank() const

returns MPI rank number in output group comm

inline RankType getOutputRankInGlobalReduceComm() const

returns the rank of the output rank in the global reduce communicator, or MPI_PROC_NULL if the output group was not set with initOutputGroupComm

inline const RankType &getThirdLevelRank() const

returns MPI rank number in all third level comms

inline const RankType &getManagerRankWorld() const

returns MPI rank number of manager in world comm

inline const RankType &getManagerRank() const

returns MPI rank number of manager in global comm

inline const RankType &getMasterRank() const

returns MPI rank number of master in local comm

inline const RankType &getThirdLevelManagerRank() const

returns MPI rank number of master in all third level comms

inline bool isWorldManager() const

returns boolean that indicates if caller is manager in world comm

inline bool isThirdLevelManager() const

returns boolean that indicates if caller is manager in all third level comms

inline bool isProcessGroupMaster() const

returns boolean that indicates if caller is master in local comm

inline size_t getNumGroups() const

returns the number of process groups

inline size_t getNumProcs() const

returns the number of processors per process group

inline bool isInitialized() const

returns boolean that indicates if MPISystem is initialized

bool recoverCommunicators(bool groupAlive, size_t numFailedGroups = 0)

starts the fault-tolerance recovery procesdure.

groupAlive indicates if the process group of the calling rank is alive failedGroups is a vector of the failed process groups

void deleteCommFT(simft::Sim_FT_MPI_Comm *comm)

This routine frees the specified fault tolerant MPI communicator The corresponding non-fault tolerant communicator associated with the FT-communicator is not destroyed!

void deleteCommFTAndCcomm(simft::Sim_FT_MPI_Comm *commFT, CommunicatorType *ccommCopy)

This routine frees the specified fault tolerant MPI communicator The corresponding non-fault tolerant communicator associated with the FT-communicator is also destroyed!

void deleteCommFTAndCcomm(simft::Sim_FT_MPI_Comm *comm)

This routine frees the specified fault tolerant MPI communicator The corresponding non-fault tolerant communicator associated with the FT-communicator is also destroyed!

void sendFailedSignal()

sends a message to the manager that this rank has failed -> used for FT simulator

void storeLocalComm(CommunicatorType lcomm)

stores local comm + FT version if FT_ENABLED

void initOutputGroupComm(uint16_t numFileParts = 1)

let the output “group” be distributed across the actual process groups

Parameters:

numFileParts – number of file partitions to distribute the output across, if 1, then the output is not partitioned