Fast matrix-free evaluation of discontinuous Galerkin finite element operators
We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators based on sum factorization on quadrilateral and hexahedral meshes. We identify a set of kernels for fast quadrature on cells and faces targeting a wide class of weak forms originating from linear and nonlinear partial differential equations. Different algorithms and data structures for the implementation of operator evaluation are compared in an in-depth performance analysis. The sum factorization kernels are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional compute kernels. In isolation our implementation then reaches up to 60% of arithmetic peak on Intel Haswell and Broadwell processors and up to 50% of arithmetic peak on Intel Knights Landing. The full operator evaluation reaches only about half that throughput due to memory bandwidth limitations from loading the input and output vectors, MPI ghost exchange, as well as handling variable coefficients and the geometry. Our performance analysis shows that the results are often within 10% of the available memory bandwidth for the proposed implementation, with the exception of the Cartesian mesh case where the cost of gather operations and MPI communication are more substantial.
READ FULL TEXT