Do you want to publish a course? Click here

On the Optimal Design of Triple Modular Redundancy Logic for SRAM-based FPGAs

339   0   0.0 ( 0 )
 Publication date 2007
and research's language is English




Ask ChatGPT about the research

Triple Modular Redundancy (TMR) is a suitable fault tolerant technique for SRAM-based FPGA. However, one of the main challenges in achieving 100% robustness in designs protected by TMR running on programmable platforms is to prevent upsets in the routing from provoking undesirable connections between signals from distinct redundant logic parts, which can generate an error in the output. This paper investigates the optimal design of the TMR logic (e.g., by cleverly inserting voters) to ensure robustness. Four differe



rate research

Read More

69 - R. Giordano , S.Perrella , 2018
High-speed serial links implemented in SRAM-based FPGAs have been extensively used in the trigger and data acquisition systems of High Energy Physics experiments. Usually, their application has been restricted to off-detector, mostly due the sensitivity of SRAM-based FPGA to radiation faults (single event upsets). However, the device tolerance to radiation environments can be achieved by adopting dedicated mitigation techniques such as information redundancy, hardware redundancy and configuration scrubbing. In this work, we discuss the design of a bi-directional serial link running at 6.25 Gbps based on a Xilinx Kintex-7 FPGA. The link is protected against single event upsets by means of all the above-mentioned methods. A self-synchronizing scrambler is used for DC-balance and data randomization, while the subsequent Reed-Solomon encoder/decoder detects and corrects bursts of errors in the transmitted data. The error correction capability of the line code is further increased by adopting the interleaving technique. Besides, in order to completely take advantage of available bandwidth and to cope with different rates of radiation-induced faults, the link can modulate the protection level of the Reed-Solomon code. The reliability of the link is also improved by means of modular redundancy on the frame alignment block. Besides, on the same FPGA, a scrubber repairs corrupted configuration frames in real-time. We present the test results carried out using the fault injection method. We show the performance of the link in terms of mean time between failures (MTBF) and fault tolerance to upsets.
The interplay between security and reliability is poorly understood. This paper shows how triple modular redundancy affects a side-channel attack (SCA). Our counterintuitive findings show that modular redundancy can increase SCA resiliency.
FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the stateof-the-art designs in both throughput and BRAM usage efficiency.
We present ESP4ML, an open-source system-level design flow to build and program SoC architectures for embedded applications that require the hardware acceleration of machine learning and signal processing algorithms. We realized ESP4ML by combining two established open-source projects (ESP and HLS4ML) into a new, fully-automated design flow. For the SoC integration of accelerators generated by HLS4ML, we designed a set of new parameterized interface circuits synthesizable with high-level synthesis. For accelerator configuration and management, we developed an embedded software runtime system on top of Linux. With this HW/SW layer, we addressed the challenge of dynamically shaping the data traffic on a network-on-chip to activate and support the reconfigurable pipelines of accelerators that are needed by the application workloads currently running on the SoC. We demonstrate our vertically-integrated contributions with the FPGA-based implementations of complete SoC instances booting Linux and executing computer-vision applications that process images taken from the Google Street View database.
High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth by exposing multiple memory channels to the processing units. To achieve high performance, an accelerator built on top of an FPGA configured with HBM (i.e., FPGA-HBM platform) needs to scale its performance according to the available memory channels. In this paper, we propose an accelerator for BFS (Breadth-First Search) algorithm, named as ScalaBFS, that builds multiple processing elements to sufficiently exploit the high bandwidth of HBM to improve efficiency. We implement the prototype system of ScalaBFS and conduct BFS in both real-world and synthetic scale-free graphs on Xilinx Alveo U280 FPGA card real hardware. The experimental results show that ScalaBFS scales its performance almost linearly according to the available memory pseudo channels (PCs) from the HBM2 subsystem of U280. By fully using the 32 PCs and building 64 processing elements (PEs) on U280, ScalaBFS achieves a performance up to 19.7 GTEPS (Giga Traversed Edges Per Second). When conducting BFS in sparse real-world graphs, ScalaBFS achieves equivalent GTEPS to Gunrock running on the state-of-art Nvidia V100 GPU that features 64-PC HBM2 (twice memory bandwidth than U280).
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا