Enabling Distributed-Memory Tensor Completion in Python using New Sparse Tensor Kernels
Tensor computations are increasingly prevalent numerical techniques in data science.However, innovation and deployment of methods on large sparse tensor datasets are made challenging by the difficulty of efficient implementation thereof.We provide a Python extension to the Cyclops tensor algebra library, which fully automates the management of distributed-memory parallelism and sparsity for NumPy-style operations on multidimensional arrays.We showcase this functionality with novel high-level implementations of three algorithms for the tensor completion problem: alternating least squares (ALS) with an implicit conjugate gradient method, stochastic gradient descent (SGD), and coordinate descent (CCD++).To make possible tensor completion for very sparse tensors, we introduce a new multi-tensor routine that is asymptotically more efficient than pairwise tensor contraction for key components of the tensor completion methods.Further, we add support for hypersparse matrix representations to Cyclops.We provide microbenchmarking results on the Stampede2 supercomputer to demonstrate the efficiency of this functionality.Finally, we study the accuracy and performance of the tensor completion methods for a synthetic tensor with 10 billion nonzeros and the Netflix dataset.
READ FULL TEXT