Scheduling and Tiling Reductions on Realistic Machines
Computations, where the number of results is much smaller than the input data and are produced through some sort of accumulation, are called Reductions. Reductions appear in many scientific applications. Usually, reductions admit an associative and commutative binary operator over accumulation. Reductions are therefore highly parallel. Given unbounded fan-in, one can execute a reduction in constant/linear time provided that the data is available. However, due to the fact that real machines have bounded fan-in, accumulations cannot be performed in one time step and have to be broken into parts. Thus, a (partial) serialization of reductions becomes necessary. This makes scheduling reductions a difficult and interesting problem. There have been a number of research works in the context of scheduling reductions. We focus on the scheduling techniques presented in Gupta et al., identify a potential issue in their scheduling algorithm and provide a solution. In addition, we demonstrate how these scheduling techniques can be extended to "tile" reductions and briefly survey other studies that address the problem of scheduling reductions.
READ FULL TEXT