Jiyuan Zhang, Yi Lu, Daniele G. Spampinato and Franz Franchetti (Proc. IEEE International Conference on Data Engineering (ICDE), 2020)
FESIA: A Fast and Efficient Set Intersection Approach on Modern CPUs
Published paper (link to publisher)
Bibtex

Set intersection is an important operation and widely used in both database and graph analytics applications. However, existing state-of-the-art set intersection methods only consider the size of input sets and fail to optimize for the case in which the intersection size is small. In real-world scenarios, the size of most intersections is usually orders of magnitude smaller than the size of the input sets, e.g., keyword search in databases and common neighbor search in graph analytics. In this paper, we present FESIA, a new set intersection approach on modern CPUs. The time complexity of our approach is O(n/√w+r), in which w is the SIMD width, and n and r are the size of input sets and intersection size, respectively. The key idea behind FESIA is that it first uses bitmaps to filter out unmatched elements from the input sets, and then selects suitable specialized kernels (i.e., small function blocks) at runtime to compute the final intersection on each pair of bitmap segments. In addition, all data structures in FESIA are designed to take advantage of SIMD instructions provided by vector ISAs with various SIMD widths, including SSE, AVX, and the latest AVX512. Our experiments on both real world and synthetic datasets show that our intersection method achieves more than an order of magnitude better performance than conventional scalar implementations, and up to 4x better performance than state-of-the-art SIMD implementations.

Keywords:
CPUs, Set intersection