ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs
Linear algebra operations have been widely used in big data analytics and scientific computations. Many works have been done on optimizing linear algebra operations on GPUs with regular-shaped input. However, few works are focusing on fully utilizing GPU resources when the input is not regular-shaped. Current optimizations lack of considering fully utilizing the memory bandwidth and computing power, therefore they could only achieve sub-optimal performance. In this paper, we propose two efficient irregular-shaped matrix-matrix multiplication (GEMM) algorithms on GPUs, called TSM2 and ISM2. Both of them focus on optimizing GEMMs with various input sizes where at least one of the matrices is tall-and-skinny. We implement our proposed algorithms and test on several modern Nvidia GPU micro-architectures. Experiments show that compared to state of the art, our TSM2 speeds up the computation by 1.1x 3x and improves the memory bandwidth utilization and computing power utilization by 8 and 7 or medium. Moreover, our ISM2 speeds up the GEMM by 1.1x 3.5x and improve the memory bandwidth utilization by up to 55 relatively small.
READ FULL TEXT