Operation Merging for Hardware Implementations of Fast Polar Decoders

63 0 0.0 ( 0 )

Download Cite

Added by Furkan Ercan

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Furkan Ercan - Thibaud Tonnellier - Carlo Condo

Hardware Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Polar codes are a class of linear block codes that provably achieves channel capacity. They have been selected as a coding scheme for the control channel of enhanced mobile broadband (eMBB) scenario for $5^{text{th}}$ generation wireless communication networks (5G) and are being considered for additional use scenarios. As a result, fast decoding techniques for polar codes are essential. Previous works targeting improved throughput for successive-cancellation (SC) decoding of polar codes are semi-parallel implementations that exploit special maximum-likelihood (ML) nodes. In this work, we present a new fast simplified SC (Fast-SSC) decoder architecture. Compared to a baseline Fast-SSC decoder, our solution is able to reduce the memory requirements. We achieve this through a more efficient memory utilization, which also enables to execute multiple operations in a single clock cycle. Finally, we propose new special node merging techniques that improve the throughput further, and detail a new Fast-SSC-based decoder architecture to support merged operations. The proposed decoder reduces the operation sequence requirement by up to $39%$, which enables to reduce the number of time steps to decode a codeword by $35%$. ASIC implementation results with 65 nm TSMC technology show that the proposed decoder has a throughput improvement of up to $31%$ compared to previous Fast-SSC decoder architectures.

rate research

Auto-Generation of Pipelined Hardware Designs for Polar Encoder

46 - Zhiwei Zhong 2018

This paper presents a general framework for auto-generation of pipelined polar encoder architectures. The proposed framework could be well represented by a general formula. Given arbitrary code length $N$ and the level of parallelism $M$, the formula could specify the corresponding hardware architecture. We have written a compiler which could read the formula and then automatically generate its register-transfer level (RTL) description suitable for FPGA or ASIC implementation. With this hardware generation system, one could explore the design space and make a trade-off between cost and performance. Our experimental results have demonstrated the efficiency of this auto-generator for polar encoder architectures.

Hardware Architecture

Algorithmic Obfuscation for LDPC Decoders

63 - Jingbo Zhou , Xinmiao Zhang 2021

In order to protect intellectual property against untrusted foundry, many logic-locking schemes have been developed. The main idea of logic locking is to insert a key-controlled block into a circuit to make the circuit function incorrectly without right keys. However, in the case that the algorithm implemented by the circuit is naturally fault-tolerant or self-correcting, existing logic-locking schemes do not affect the system performance much even if wrong keys are used. One example is low-density parity-check (LDPC) error-correcting decoder, which has broad applications in digital communications and storage. This paper proposes two algorithmic-level obfuscation methods for LDPC decoders. By modifying the decoding process and locking the stopping criterion, our new designs substantially degrade the decoder throughput and/or error-correcting performance when the wrong key is used. Besides, our designs are also resistant to the SAT, AppSAT and removal attacks. For an example LDPC decoder, our proposed methods reduce the throughput to less than 1/3 and/or increase the decoder error rate by at least two orders of magnitude with only 0.33% area overhead.

Hardware Architecture Cryptography and Security

Fast Thresholded SC-Flip Decoding of Polar Codes

56 - Furkan Ercan , Warren J. Gross 2020

SC-Flip (SCF) decoding algorithm shares the attention with the common polar code decoding approaches due to its low-complexity and improved error-correction performance. However, the inefficient criterion for locating the correct bit-flipping position in SCF decoding limits its improvements. Due to its improved bit-flipping criterion, Thresholded SCF (TSCF) decoding algorithm exhibits a superior error-correction performance and lower computational complexity than SCF decoding. However, the parameters of TSCF decoding depend on multiple channel and code parameters, and are obtained via Monte-Carlo simulations. Our main goal is to realize TSCF decoding as a practical polar decoder implementation. To this end, we first realize an approximated threshold value that is independent of the code parameters and precomputations. The proposed approximation has negligible error-correction performance degradation on the TSCF decoding. Then, we validate an alternative approach for forming a critical set that does not require precomputations, which also paves the way to the implementation of the Fast-TSCF decoder. Compared to the existing fast SCF implementations, the proposed Fast-TSCF decoder has $0.24$ to $0.41$ dB performance gain at frame error rate of $10^{-3}$, without any extra cost. Compared to the TSCF decoding, Fast-TSCF does not depend on precomputations and requires $87%$ fewer decoding steps. Finally, implementation results in TSMC 65nm CMOS technology show that the Fast-TSCF decoder is $20%$ and $82%$ more area-efficient than the state-of-the-art fast SCF and fast SC-List decoder architectures, respectively.

Hardware Architecture

Energy-efficient Decoders for Compressive Sensing: Fundamental Limits and Implementations

406 - Tongxin Li , Mayank Bakshi , Pulkit Grover 2014

The fundamental problem considered in this paper is What is the textit{energy} consumed for the implementation of a emph{compressive sensing} decoding algorithm on a circuit?. Using the information-friction framework, we examine the smallest amount of textit{bit-meters} as a measure for the energy consumed by a circuit. We derive a fundamental lower bound for the implementation of compressive sensing decoding algorithms on a circuit. In the setting where the number of measurements scales linearly with the sparsity and the sparsity is sub-linear with the length of the signal, we show that the textit{bit-meters} consumption for these algorithms is order-tight, i.e., it matches the lower bound asymptotically up to a constant factor. Our implementations yield interesting insights into design of energy-efficient circuits that are not captured by the notion of computational efficiency alone.

Information Theory Information Theory

FPGA Implementations of Layered MinSum LDPC Decoders Using RCQ Message Passing

92 - Caleb Terrill , Linfang Wang , Sean Chen 2021

Non-uniform message quantization techniques such as reconstruction-computation-quantization (RCQ) improve error-correction performance and decrease hardware complexity of low-density parity-check (LDPC) decoders that use a flooding schedule. Layered MinSum RCQ (L-msRCQ) enables message quantization to be utilized for layered decoders and irregular LDPC codes. We investigate field-programmable gate array (FPGA) implementations of L-msRCQ decoders. Three design methods for message quantization are presented, which we name the Lookup, Broadcast, and Dribble methods. The decoding performance and hardware complexity of these schemes are compared to a layered offset MinSum (OMS) decoder. Simulation results on a (16384, 8192) protograph-based raptor-like (PBRL) LDPC code show that a 4-bit L-msRCQ decoder using the Broadcast method can achieve a 0.03 dB improvement in error-correction performance while using 12% fewer registers than the OMS decoder. A Broadcast-based 3-bit L-msRCQ decoder uses 15% fewer lookup tables, 18% fewer registers, and 13% fewer routed nets than the OMS decoder, but results in a 0.09 dB loss in performance.

Signal Processing Hardware Architecture Information Theory