Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Markus Püschel, Peter A. Milder and James C. Hoe (Journal of the ACM, Vol. 56, No. 2, pp. 10:1-10:34, 2009)
Permuting Streaming Data Using RAMs
Comment: The designs in this paper are protected by U.S. Patent No. 8,321,823 (assignee: Carnegie Mellon University).
Preprint (511 KB)
Published paper (link to publisher)
Bibtex
This paper presents a method for constructing hardware structures to perform permutations on streaming data. The method applies to permutations that can be represented as linear mappings on the bit-level representation of the data locations, a subclass that includes many important permutations such as stride permutations (corner turn, perfect shuffle, etc.), the bit reversal, and the Hadamard reordering. This problem has only been considered in the literature for the special case of stride permutations. We propose a family of parameterized datapaths that consist of several independent banks of memory and two interconnection networks. These structures are built for a given streaming width, i.e., number of inputs and outputs per cycle. The structures are able to operate at full throughput for a given streaming width. We provide an algorithm that completely specifies the datapath and control logic given a permutation and a number of input/output ports. Further, we provide lower bounds on the achievable cost of the solution. For an important subclass of considered permutations, the resulting design is proven optimal in both the cost of control and the number of switching elements required by the interconnection networks. We apply our algorithm to derive complete solutions for several important permutations, including a detailed example that carefully illustrates each aspect of the design process. Lastly, we compare our permutation structures to those of [Jarvinen et al. 2004], which are specialized for stride permutations.
Keywords: IP cores for FPGA/ASIC, Streaming permutationsMore information: