ترغب بنشر مسار تعليمي؟ اضغط هنا

Testing Compilers for Programmable Switches Through Switch Hardware Simulation

49   0   0.0 ( 0 )
 نشر من قبل Michael Wong
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Programmable switches have emerged as powerful and flexible alternatives to fixed-function forwarding devices. But because of the unique hardware constraints of network switches, the design and implementation of compilers targeting these devices is tedious and error prone. Despite the important role that compilers play in software development, there is a dearth of tools for testing compilers for programmable network devices. We present Druzhba, a programmable switch simulator used for testing compilers targeting programmable packet-processing substrates. We show that we can model the low-level behavior of a switchs programmable hardware. We further show how our machine model can be used by compiler developers to target Druzhba as a compiler backend. Generated machine code programs are fed into Druzhba and tested using a fuzzing-based approach that allows compiler developers to test the correctness of their compilers. Using a program-synthesis-based compiler as a case study, we demonstrate how Druzhba has been successful in testing compiler-generated machine code for our simulated switch pipeline instruction set.



قيم البحث

اقرأ أيضاً

Programmable photonic circuits of reconfigurable interferometers can be used to implement arbitrary operations on optical modes, facilitating a flexible platform for accelerating tasks in quantum simulation, signal processing, and artificial intellig ence. A major obstacle to scaling up these systems is static fabrication error, where small component errors within each device accrue to produce significant errors within the circuit computation. Mitigating this error usually requires numerical optimization dependent on real-time feedback from the circuit, which can greatly limit the scalability of the hardware. Here we present a deterministic approach to correcting circuit errors by locally correcting hardware errors within individual optical gates. We apply our approach to simulations of large scale optical neural networks and infinite impulse response filters implemented in programmable photonics, finding that they remain resilient to component error well beyond modern day process tolerances. Our results highlight a new avenue for scaling up programmable photonics to hundreds of modes within current day fabrication processes.
Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in target ap plications, the algorithms used, and accelerator architectures. Since developing memory controllers for different applications is time-consuming, this paper introduces a modular and programmable memory controller that can be configured for different target applications on available hardware resources. The proposed memory controller efficiently supports cache-line accesses along with bulk memory transfers. The user can configure the controller depending on the available logic resources on the FPGA, memory access pattern, and external memory specifications. The modular design supports various memory access optimization techniques including, request scheduling, internal caching, and direct memory access. These techniques contribute to reducing the overall latency while maintaining high sustained bandwidth. We implement the system on a state-of-the-art FPGA and evaluate its performance using two widely studied domains: graph analytics and deep learning workloads. We show improved overall memory access time up to 58% on CNN and GCN workloads compared with commercial memory controller IPs.
Artificial intelligence (AI) and Machine Learning (ML) are becoming pervasive in todays applications, such as autonomous vehicles, healthcare, aerospace, cybersecurity, and many critical applications. Ensuring the reliability and robustness of the un derlying AI/ML hardware becomes our paramount importance. In this paper, we explore and evaluate the reliability of different AI/ML hardware. The first section outlines the reliability issues in a commercial systolic array-based ML accelerator in the presence of faults engendering from device-level non-idealities in the DRAM. Next, we quantified the impact of circuit-level faults in the MSB and LSB logic cones of the Multiply and Accumulate (MAC) block of the AI accelerator on the AI/ML accuracy. Finally, we present two key reliability issues -- circuit aging and endurance in emerging neuromorphic hardware platforms and present our system-level approach to mitigate them.
Hardware flaws are permanent and potent: hardware cannot be patched once fabricated, and any flaws may undermine any software executing on top. Consequently, verification time dominates implementation time. The gold standard in hardware Design Verifi cation (DV) is concentrated at two extremes: random dynamic verification and formal verification. Both struggle to root out the subtle flaws in complex hardware that often manifest as security vulnerabilities. The root problem with random verification is its undirected nature, making it inefficient, while formal verification is constrained by the state-space explosion problem, making it infeasible against complex designs. What is needed is a solution that is directed, yet under-constrained. Instead of making incremental improvements to existing DV approaches, we leverage the observation that existing software fuzzers already provide such a solution, and adapt them for hardware DV. Specifically, we translate RTL hardware to a software model and fuzz that model. The central challenge we address is how best to mitigate the differences between the hardware execution model and software execution model. This includes: 1) how to represent test cases, 2) what is the hardware equivalent of a crash, 3) what is an appropriate coverage metric, and 4) how to create a general-purpose fuzzing harness for hardware. To evaluate our approach, we fuzz four IP blocks from Googles OpenTitan SoC. Our experiments reveal a two orders-of-magnitude reduction in run time to achieve Finite State Machine (FSM) coverage over traditional dynamic verification schemes. Moreover, with our design-agnostic harness, we achieve over 88% HDL line coverage in three out of four of our designs -- even without any initial seeds.
In-network computation has been widely used to accelerate data-intensive distributed applications. Some computational tasks, traditional performed on servers, are offloaded to the network (i.e. programmable switches). However, the computational capac ity of programmable switches is limited to simple integer arithmetic operations while many of applications require on-the-fly floating-point operations. To address this issue, prior approaches either adopt a float-to-integer method or directly offload computational tasks to the local CPUs of switches, incurring accuracy loss and delayed processing. To this end, we propose NetFC, a table-lookup method to achieve on-the-fly in-network floating-point arithmetic operations nearly without accuracy loss. NetFC adopts a divide-and-conquer mechanism that converts the original huge table into several much small tables together with some integer operations. NetFC adopts a scaling-factor mechanism for computational accuracy improvement, and a prefix-based lossless table compression method to reduce the memory overhead. We use different types of datasets to evaluate NetFC. The experimental results show that the average accuracy of NetFC can be as high as up to 99.94% at worst with only 448KB memory consumption. Furthermore, we integrate NetFC into Sonata for detecting Slowloris attack, yielding significant decrease of detection delay.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا