Performance of MPI sends of non-contiguous data

09/27/2018
by   Victor Eijkhout, et al.
0

We present an experimental investigation of the performance of MPI derived datatypes. For messages up to the megabyte range most schemes perform comparably to each other and to manual copying into a regular send buffer. However, for large messages the internal buffering of MPI causes differences in efficiency. The optimal scheme is a combination of packing and derived types.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2022

Lessons Learned on MPI+Threads Communication

Hybrid MPI+threads programming is gaining prominence, but, in practice, ...
research
08/22/2019

Network-Accelerated Non-Contiguous Memory Transfers

Applications often communicate data that is non-contiguous in the send- ...
research
12/28/2020

TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes

MPI derived datatypes are an abstraction that simplifies handling of non...
research
05/27/2021

Measuring OpenSHMEM Communication Routines with SKaMPI-OpenSHMEM User's manual

This document presents the OpenSHMEM extension for the Special Karlsruhe...
research
08/07/2023

Quantifying the Performance Benefits of Partitioned Communication in MPI

Partitioned communication was introduced in MPI 4.0 as a user-friendly i...
research
10/25/2018

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

TensorFlow has been the most widely adopted Machine/Deep Learning framew...
research
05/27/2022

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

This paper studies the utility of using data analytics and machine learn...

Please sign up or login with your details

Forgot password? Click here to reset