Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture

08/02/2017
by   Christian Knauth, et al.
0

The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping via a cache-localized matrix buffer. Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive approach, which reduces the bit-reversed permutation to smaller bit-reversed permutations and a square matrix transposition. These new methods are compared to the extant approaches in terms of theoretical runtime, empirical compile time, and empirical runtime. The template-recursive cache-oblivious method is shown to be competitive with the fastest known method; however, we demonstrate that the cache-oblivious method can more readily benefit from parallelization on multiple cores and on the GPU.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset