A Fast Finite Field Multiplier for SIKE

56 0 0.0 ( 0 )

Download Cite

Added by Yeonsoo Jeon

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Yeonsoo Jeon - Dongsuk Jeon

Hardware Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Various post-quantum cryptography algorithms have been recently proposed. Supersingluar isogeny Diffie-Hellman key exchange (SIKE) is one of the most promising candidates due to its small key size. However, the SIKE scheme requires numerous finite field multiplications for its isogeny computation, and hence suffers from slow encryption and decryption process. In this paper, we propose a fast finite field multiplier design that performs multiplications in GF(p) with high throughput and low latency. The design accelerates the computation by adopting deep pipelining, and achieves high hardware utilization through data interleaving. The proposed finite field multiplier demonstrates 4.48 times higher throughput than prior work based on the identical fast multiplication algorithm and 1.43 times higher throughput than the state-of-the-art fast finite field multiplier design aimed at SIKE.

rate research

Fast Swapping in a Quantum Multiplier Modelled as a Queuing Network

98 - Evan E. Dobbs , Robert Basmadjian , Alexandru Paler 2021

Predicting the optimum SWAP depth of a quantum circuit is useful because it informs the compiler about the amount of necessary optimization. Fast prediction methods will prove essential to the compilation of practical quantum circuits. In this paper, we propose that quantum circuits can be modeled as queuing networks, enabling efficient extraction of the parallelism and duration of SWAP circuits. To provide preliminary substantiation of this approach, we compile a quantum multiplier circuit and use a queuing network model to accurately determine the quantum circuit parallelism and duration. Our method is scalable and has the potential speed and precision necessary for large scale quantum circuit compilation.

Quantum Physics

Operation Merging for Hardware Implementations of Fast Polar Decoders

62 - Furkan Ercan , Thibaud Tonnellier , Carlo Condo 2020

Polar codes are a class of linear block codes that provably achieves channel capacity. They have been selected as a coding scheme for the control channel of enhanced mobile broadband (eMBB) scenario for $5^{text{th}}$ generation wireless communication networks (5G) and are being considered for additional use scenarios. As a result, fast decoding techniques for polar codes are essential. Previous works targeting improved throughput for successive-cancellation (SC) decoding of polar codes are semi-parallel implementations that exploit special maximum-likelihood (ML) nodes. In this work, we present a new fast simplified SC (Fast-SSC) decoder architecture. Compared to a baseline Fast-SSC decoder, our solution is able to reduce the memory requirements. We achieve this through a more efficient memory utilization, which also enables to execute multiple operations in a single clock cycle. Finally, we propose new special node merging techniques that improve the throughput further, and detail a new Fast-SSC-based decoder architecture to support merged operations. The proposed decoder reduces the operation sequence requirement by up to $39%$, which enables to reduce the number of time steps to decode a codeword by $35%$. ASIC implementation results with 65 nm TSMC technology show that the proposed decoder has a throughput improvement of up to $31%$ compared to previous Fast-SSC decoder architectures.

Hardware Architecture

When is Every Quasi-Multiplier a Multiplier?

48 - Lawrence G. Brown 2018

We answer the title question for sigma-unital C*-algebras. The answer is that the algebra must be the direct sum of a dual C*-algebra and a C*-algebra satisfying a certain local unitality condition. We also discuss similar problems in the context of Hilbert C*-bimodules and imprimitivity bimodules and in the context of centralizers of Pedersens ideal.

Operator Algebras Functional Analysis

Fast computation of a rational point of a variety over a finite field

79 - Antonio Cafure Departamento den Matematica , Facultad de Ciencias Exactas y Naturales 2004

We exhibit a probabilistic algorithm which computes a rational point of an absolutely irreducible variety over a finite field defined by a reduced regular sequence. Its time--space complexity is roughly quadratic in the logarithm of the cardinality of the field and a geometric invariant of the input system (called its degree), which is always bounded by the Bezout number of the system. Our algorithm works for fields of any characteristic, but requires the cardinality of the field to be greater than a quantity which is roughly the fourth power of the degree of the input variety.

Number Theory Algebraic Geometry

Fast Thresholded SC-Flip Decoding of Polar Codes

56 - Furkan Ercan , Warren J. Gross 2020

SC-Flip (SCF) decoding algorithm shares the attention with the common polar code decoding approaches due to its low-complexity and improved error-correction performance. However, the inefficient criterion for locating the correct bit-flipping position in SCF decoding limits its improvements. Due to its improved bit-flipping criterion, Thresholded SCF (TSCF) decoding algorithm exhibits a superior error-correction performance and lower computational complexity than SCF decoding. However, the parameters of TSCF decoding depend on multiple channel and code parameters, and are obtained via Monte-Carlo simulations. Our main goal is to realize TSCF decoding as a practical polar decoder implementation. To this end, we first realize an approximated threshold value that is independent of the code parameters and precomputations. The proposed approximation has negligible error-correction performance degradation on the TSCF decoding. Then, we validate an alternative approach for forming a critical set that does not require precomputations, which also paves the way to the implementation of the Fast-TSCF decoder. Compared to the existing fast SCF implementations, the proposed Fast-TSCF decoder has $0.24$ to $0.41$ dB performance gain at frame error rate of $10^{-3}$, without any extra cost. Compared to the TSCF decoding, Fast-TSCF does not depend on precomputations and requires $87%$ fewer decoding steps. Finally, implementation results in TSMC 65nm CMOS technology show that the Fast-TSCF decoder is $20%$ and $82%$ more area-efficient than the state-of-the-art fast SCF and fast SC-List decoder architectures, respectively.

Hardware Architecture