Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
S. Fu, N. Zhang and Franz Franchetti (Proc. Parallel Architectures and Compilation Techniques (PACT), 2024)
Accelerating High-Precision Number Theoretic Transforms using Intel AVX-512
Comment: 1st Place in ACM Student Research Competition
Preprint (850 KB)
Bibtex
Fully Homomorphic Encryption (FHE) allows various platforms to manipulate encrypted data offering ideal privacy protection. However, implementing FHE requires significant computing overhead, which can make it impractical in several applications. Researchers have previously addressed the computational overhead by developing accelerators specifically designed for FHE [6–8] or on server-level GPUs [5, 9]. While proven effective, we seek to accelerate FHE on CPUs, broadening the range of applications dramatically. In this work, we choose to focus on Number Theoretic Transform (NTT), which is the computational bottleneck of multiple state-of-the- art FHE schemes that account for over 90% of FHE execution time in practice [3]. The NTT algorithm primarily depends on three integer modulo arithmetic operations; namely modular addition, modular subtraction, and modular multiplication. Specifically, we seek to optimize these operations on large multi-word integers of 128 bits and utilize Intel’s Advanced Vector Extensions 512 (AVX- 512) to parallelize our code.We then use these vectorized operations to accelerate any 𝑛-point NTT algorithm on a single CPU core.
Keywords: CPUs, Transform, Homomorphic encryption, Number theoretic transforms, Vector