On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers
We study the expected completion time of some recently proposed algorithms for distributed computing which redundantly assign computing tasks to multiple machines in order to tolerate a certain number of machine failures. We analytically show that not only the amount of redundancy but also the task-to-machine assignments affect the latency in a distributed system. We study systems with a fixed number of computing tasks that are split in possibly overlapping batches, and independent exponentially distributed machine service times. We show that, for such systems, the uniform replication of non- overlapping (disjoint) batches of computing tasks achieves the minimum expected computing time.
READ FULL TEXT