Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Marcela Zuluaga, Peter A. Milder and Markus Püschel (ACM Transactions on Design Automation of Electronic Systems, Vol. 21, No. 4, pp. 55, 2016)
Streaming Sorting Networks
Preprint (2.5 MB)
Published paper (link to publisher)
Sorting is a fundamental problem in computer science and has been studied extensively. Thus, a large variety of sorting methods exists for both software and hardware implementations. For the latter, there is a tradeoff between the throughput achieved and the cost, i.e., the area or amount of logic and storage invested to sort n elements. Two popular solutions are bitonic sorting networks with O(n\log^2 n) logic and storage, which sort $n$ elements per cycle, and linear sorters with O(n) logic and storage, which sort $n$ elements per $n$ cycles. In this paper, we present new hardware structures that we call streaming sorting networks, which we derive through a mathematical formalism that we introduce. With the new networks we achieve novel and improved cost-performance tradeoffs. For example, assuming n is a two-power and w is any divisor of n, one class of these networks can sort in n/w cycles with O(w\log^2n) logic and O(n\log^2n) storage; the other class we present sorts in n\log^2n/w cycles with O(w) logic and O(n) storage. We carefully analyze the performance of these networks and their cost at three levels of abstraction: (1) asymptotically, (2) exactly in terms of the number of basic elements needed, and (3) in terms of the resources required by the actual circuit when mapped to a field-programmable gate array. We obtain the latter results through a domain-specific hardware generator that translates our formal mathematical description into synthesizable RTL Verilog. With this generator we explore the entire design space, identify the Pareto-optimal solutions, and show superior cost-performance tradeoffs compared to prior work.Keywords: Algorithm theory and analysis, IP cores for FPGA/ASIC, Sorting, Streaming permutations