Optimized Vectorization Implementation of CRYSTALS-Dilithium
CRYSTALS-Dilithium is a lattice-based signature scheme to be standardized by NIST as the primary post-quantum signature algorithm. In this work, we make a thorough study of optimizing the implementations of Dilithium by utilizing the Advanced Vector Extension (AVX) instructions, specifically AVX2 and the latest AVX512. We first present an improved parallel small polynomial multiplication with tailored early evaluation (PSPM-TEE) to further speed up the signing procedure, which results in a speedup of 5%-6% compared with the original PSPM Dilithium implementation. We then present a tailored reduction method that is simpler and faster than Montgomery reduction. Our optimized AVX2 implementation exhibits a speedup of 3%-8% compared with the state-of-the-art of Dilithium AVX2 software. Finally, for the first time, we propose a fully and highly vectorized implementation of Dilithium using AVX-512. This is achieved by carefully vectorizing most of Dilithium functions with the AVX512 instructions in order to improve efficiency both for time and for space simultaneously. With all the optimization efforts, our AVX-512 implementation improves the performance by 37.3%/50.7%/39.7% in key generation, 34.1%/37.1%/42.7% in signing, and 38.1%/38.7%/40.7% in verification for the parameter sets of Dilithium2/3/5 respectively. To the best of our knowledge, our AVX512 implementation has the best performance for Dilithium on the Intel x64 CPU platform to date.
READ FULL TEXT