Class MPISystem
Defined in File MPISystem.hpp
Class Documentation
-
class MPISystem
This class encapsulates the access to the different communicators being used.
Each process belongs to different MPI communicators which can overlap. If a process does not belong to a specific communicator the corresponding get function returns MPI_COMM_NULL.
worldComm_: contains all processes. usually equal to MPI_COMM_WORLD. if fault tolerance is enabled the processes of groups detected as failed will be removed
globalComm_: contains the manager and the master process of each process group
spareCommFT_: contains inactive but reusable ranks. Only used for fault-tolerance simulations.
localComm_: contains the processes of the group a process is assigned to. this is MPI_COMM_NULL for the manager process. this is the communicator the Task uses for the computations.
globalReduceComm_: contains the processes for the global reduction. in the case of equally sized process groups (at the moment this is the only option) this communicator contains all processes which have the same rank in localComm_. it is MPI_COMM_NULL on the manager.
thirdLevelComms_: list of communicators for the thirdLevelCombination. Each communicator connects all workers of a pg to the process manager. The list differs for each caller and contains only the comms where he participates: The process manager gets all comms whereas each worker has a single entry.
getXXXCommFT returns the fault-tolerant equivalent of Communicator XXX
Public Functions
-
void init(size_t ngroups, size_t nprocs, bool withWorldManager = true)
(re-)initializes MPI system including local, global and global reduce communicator
- Parameters:
ngroups – number of process groups
nprocs – number of MPI processes per process group
withWorldManager – true for manager-worker setups, false for worker-only setups
-
void init(size_t ngroups, size_t nprocs, CommunicatorType lcomm, bool withWorldManager = true)
(re-)initializes MPI system including global and global reduce communicator
local communicator is given here and not set by the init procedure
-
void initWorldReusable(CommunicatorType wcomm, size_t ngroups, size_t nprocs, bool withWorldManager = true, bool verbose = false)
(re)initializes MPI system including world communicator, local, global and global reduce communicators
- Parameters:
wcomm – world communicator
ngroups – number of process groups
nprocs – number of MPI processes per process group
withWorldManager – true for manager-worker setups, false for worker-only setups
-
inline const CommunicatorType &getWorldComm() const
returns the world communicator which contains all activeranks (excluding spare ranks)
-
inline const CommunicatorType &getGlobalComm() const
returns the global communicator which contains all manager and group master ranks
-
inline const CommunicatorType &getLocalComm() const
returns the local communicator which contains all ranks within the process group of caller
-
inline const CommunicatorType &getGlobalReduceComm() const
returns the global reduce communicator which contains all ranks with wich the rank needs to communicate in global allreduce step
All of these ranks are responsible for the same area in the domain, and have the same rank in their respective process groups / local communicators
-
inline const CommunicatorType &getOutputGroupComm() const
returns the (diagonally assigned) communicator for the file-based widely-distributed output
returs MPI_COMM_NULL if initOutputGroupComm was not called
-
inline const CommunicatorType &getOutputComm() const
returns a sub-communicator of the OutputGroupComm, if partitioned file output is used
returs MPI_COMM_NULL if initOutputGroupComm was not called with numFileParts > 1 or was not called at all
-
inline RankType getFilePartNumber() const
returns the output file partition number of the calling rank
should only be called from ranks in the output group
-
inline const std::vector<CommunicatorType> &getThirdLevelComms() const
-
inline simft::Sim_FT_MPI_Comm getWorldCommFT()
returns the fault tolerant version of the world comm (excluding spare ranks)
-
inline simft::Sim_FT_MPI_Comm getSpareCommFT()
returns the communicator containing spare processors (or ranks)
-
inline simft::Sim_FT_MPI_Comm getGlobalCommFT()
returns the fault tolerant version of the global comm
-
inline simft::Sim_FT_MPI_Comm getLocalCommFT()
returns the fault tolerant version of the local comm
-
inline simft::Sim_FT_MPI_Comm getGlobalReduceCommFT()
returns the fault tolerant version of the allreduce comm
-
inline const RankType &getGlobalReduceRank() const
returns MPI rank number in global reduce comm
should be the same as the process group number of the calling rank
-
inline RankType getOutputRankInGlobalReduceComm() const
returns the rank of the output rank in the global reduce communicator, or MPI_PROC_NULL if the output group was not set with initOutputGroupComm
-
inline const RankType &getManagerRankWorld() const
returns MPI rank number of manager in world comm
-
inline const RankType &getThirdLevelManagerRank() const
returns MPI rank number of master in all third level comms
-
inline bool isWorldManager() const
returns boolean that indicates if caller is manager in world comm
-
inline bool isThirdLevelManager() const
returns boolean that indicates if caller is manager in all third level comms
-
inline bool isProcessGroupMaster() const
returns boolean that indicates if caller is master in local comm
-
inline size_t getNumGroups() const
returns the number of process groups
-
inline size_t getNumProcs() const
returns the number of processors per process group
-
bool recoverCommunicators(bool groupAlive, size_t numFailedGroups = 0)
starts the fault-tolerance recovery procesdure.
groupAlive indicates if the process group of the calling rank is alive failedGroups is a vector of the failed process groups
-
void deleteCommFT(simft::Sim_FT_MPI_Comm *comm)
This routine frees the specified fault tolerant MPI communicator The corresponding non-fault tolerant communicator associated with the FT-communicator is not destroyed!
-
void deleteCommFTAndCcomm(simft::Sim_FT_MPI_Comm *commFT, CommunicatorType *ccommCopy)
This routine frees the specified fault tolerant MPI communicator The corresponding non-fault tolerant communicator associated with the FT-communicator is also destroyed!
-
void deleteCommFTAndCcomm(simft::Sim_FT_MPI_Comm *comm)
This routine frees the specified fault tolerant MPI communicator The corresponding non-fault tolerant communicator associated with the FT-communicator is also destroyed!
-
void sendFailedSignal()
sends a message to the manager that this rank has failed -> used for FT simulator
-
void storeLocalComm(CommunicatorType lcomm)
stores local comm + FT version if FT_ENABLED
-
void initOutputGroupComm(uint16_t numFileParts = 1)
let the output “group” be distributed across the actual process groups
- Parameters:
numFileParts – number of file partitions to distribute the output across, if 1, then the output is not partitioned
-
void init(size_t ngroups, size_t nprocs, bool withWorldManager = true)