S. Fu, N. Zhang and Franz Franchetti (Proc. Parallel Architectures and Compilation Techniques (PACT), 2024)
Accelerating High-Precision Number Theoretic Transforms using Intel AVX-512
Preprint (850 KB)
Bibtex

Fully Homomorphic Encryption (FHE) allows various platforms to manipulate encrypted data offering ideal privacy protection. However, implementing FHE requires significant computing overhead, which can make it impractical in several applications. Researchers have previously addressed the computational overhead by developing accelerators specifically designed for FHE [6–8] or on server-level GPUs [5, 9]. While proven effective, we seek to accelerate FHE on CPUs, broadening the range of applications dramatically. In this work, we choose to focus on Number Theoretic Transform (NTT), which is the computational bottleneck of multiple state-of-the- art FHE schemes that account for over 90% of FHE execution time in practice [3]. The NTT algorithm primarily depends on three integer modulo arithmetic operations; namely modular addition, modular subtraction, and modular multiplication. Specifically, we seek to optimize these operations on large multi-word integers of 128 bits and utilize Intel’s Advanced Vector Extensions 512 (AVX- 512) to parallelize our code.We then use these vectorized operations to accelerate any 𝑛-point NTT algorithm on a single CPU core.

Keywords:
CPUs, Transform, Homomorphic encryption, Number theoretic transforms, Vector