Communication-Optimal Parallel Standard and Karatsuba Integer Multiplication in the Distributed Memory Model
We present COPSIM a parallel implementation of standard integer multiplication for the distributed memory setting, and COPK a parallel implementation of Karatsuba's fast integer multiplication algorithm for a distributed memory setting. When using 𝒫 processors, each equipped with a local non-shared memory, to compute the product of tho n-digits integer numbers, under mild conditions, our algorithms achieve optimal speedup of the computational time. That is, 𝒪(n^2/𝒫) for COPSIM, and 𝒪(n^log_2 3/𝒫) for COPK. The total amount of memory required across the processors is 𝒪(n), that is, within a constant factor of the minimum space required to store the input values. We rigorously analyze the Input/Output (I/O) cost of the proposed algorithms. We show that their bandwidth cost (i.e., the number of memory words sent or received by at least one processors) matches asymptotically corresponding known I/O lower bounds, and their latency (i.e., the number of messages sent or received in the algorithm's critical execution path) is asymptotically within a multiplicative factor 𝒪(log^2_2 𝒫) of the corresponding known I/O lower bounds. Hence, our algorithms are asymptotically optimal with respect to the bandwidth cost and almost asymptotically optimal with respect to the latency cost.
READ FULL TEXT