Francois Serre and Markus Püschel (Proc. FPGA, pp. 215-223, 2016)
Optimal Circuits for Streamed Linear Permutations using RAM
Preprint (639 KB)
Published paper (link to publisher)
Bibtex

We propose a method to automatically derive hardware structures that perform a fixed linear permutation on streaming data. Linear permutations are permutations that map linearly the bit representation of the elements addresses. This set contains many of the most important permutations in media processing, communication, and other applications and includes perfect shuffles, stride permutations, and the bit reversal. Streaming means that the data to be permuted arrive as a sequence of chunks over several cycles. We solve this problem by mathematically decomposing a given permutation into a sequence of three permutations that are either temporal or spatial. The former are implemented as banks of RAM, the latter as switching networks. We prove optimality of our solution in terms of the number of switches in these networks.

Keywords:
IP cores for FPGA/ASIC, Sorting, Streaming permutations, Fast Fourier Transform, Hardware

More information:

Spiral FFT generator