Markus Püschel, Peter A. Milder and James C. Hoe (Journal of the ACM, Vol. 56, No. 2, pp. 10:1-10:34, 2009)
Permuting Streaming Data Using RAMs
Comment: Submitted for patent application
Published paper (link to publisher)
Bibtex

This paper presents a method for constructing hardware structures to perform permutations on streaming data. The method applies to permutations that can be represented as linear mappings on the bit-level representation of the data locations, a subclass that includes many important permutations such as stride permutations (corner turn, perfect shuffle, etc.), the bit reversal, and the Hadamard reordering. This problem has only been considered in the literature for the special case of stride permutations. We propose a family of parameterized datapaths that consist of several independent banks of memory and two interconnection networks. These structures are built for a given streaming width, i.e., number of inputs and outputs per cycle. The structures are able to operate at full throughput for a given streaming width. We provide an algorithm that completely specifies the datapath and control logic given a permutation and a number of input/output ports. Further, we provide lower bounds on the achievable cost of the solution. For an important subclass of considered permutations, the resulting design is proven optimal in both the cost of control and the number of switching elements required by the interconnection networks. We apply our algorithm to derive complete solutions for several important permutations, including a detailed example that carefully illustrates each aspect of the design process. Lastly, we compare our permutation structures to those of [Jarvinen et al. 2004], which are specialized for stride permutations.

Keywords:
IP cores for FPGA/ASIC, Streaming permutations