ترغب بنشر مسار تعليمي؟ اضغط هنا

The increasing computational demand of Deep Learning has propelled research in special-purpose inference accelerators based on emerging non-volatile memory (NVM) technologies. Such NVM crossbars promise fast and energy-efficient in-situ Matrix Vector Multiplication (MVM) thus alleviating the long-standing von Neuman bottleneck in todays digital hardware. However, the analog nature of computing in these crossbars is inherently approximate and results in deviations from ideal output values, which reduces the overall performance of Deep Neural Networks (DNNs) under normal circumstances. In this paper, we study the impact of these non-idealities under adversarial circumstances. We show that the non-ideal behavior of analog computing lowers the effectiveness of adversarial attacks, in both Black-Box and White-Box attack scenarios. In a non-adaptive attack, where the attacker is unaware of the analog hardware, we observe that analog computing offers a varying degree of intrinsic robustness, with a peak adversarial accuracy improvement of 35.34%, 22.69%, and 9.90% for white box PGD (epsilon=1/255, iter=30) for CIFAR-10, CIFAR-100, and ImageNet respectively. We also demonstrate Hardware-in-Loop adaptive attacks that circumvent this robustness by utilizing the knowledge of the NVM model.
The `Internet of Things has brought increased demand for AI-based edge computing in applications ranging from healthcare monitoring systems to autonomous vehicles. Quantization is a powerful tool to address the growing computational cost of such appl ications, and yields significant compression over full-precision networks. However, quantization can result in substantial loss of performance for complex image classification tasks. To address this, we propose a Principal Component Analysis (PCA) driven methodology to identify the important layers of a binary network, and design mixed-precision networks. The proposed Hybrid-Net achieves a more than 10% improvement in classification accuracy over binary networks such as XNOR-Net for ResNet and VGG architectures on CIFAR-100 and ImageNet datasets while still achieving up to 94% of the energy-efficiency of XNOR-Nets. This work furthers the feasibility of using highly compressed neural networks for energy-efficient neural computing in edge devices.
The recent advent of `Internet of Things (IOT) has increased the demand for enabling AI-based edge computing. This has necessitated the search for efficient implementations of neural networks in terms of both computations and storage. Although extrem e quantization has proven to be a powerful tool to achieve significant compression over full-precision networks, it can result in significant degradation in performance. In this work, we propose extremely quantized hybrid network architectures with both binary and full-precision sections to emulate the classification performance of full-precision networks while ensuring significant energy efficiency and memory compression. We explore several hybrid network architectures and analyze the performance of the networks in terms of accuracy, energy efficiency and memory compression. We perform our analysis on ResNet and VGG network architectures. Among the proposed network architectures, we show that the hybrid networks with full-precision residual connections emerge as the optimum by attaining accuracies close to full-precision networks while achieving excellent memory compression, up to 21.8x in case of VGG-19. This work demonstrates an effective way of hybridizing networks which achieve performance close to full-precision networks while attaining significant compression, furthering the feasibility of using such networks for energy-efficient neural computing in IOT-based edge devices.
Deep neural networks are a biologically-inspired class of algorithms that have recently demonstrated state-of-the-art accuracies involving large-scale classification and recognition tasks. Indeed, a major landmark that enables efficient hardware acce lerators for deep networks is the recent advances from the machine learning community that have demonstrated aggressively scaled deep binary networks with state-of-the-art accuracies. In this paper, we demonstrate how deep binary networks can be accelerated in modified von-Neumann machines by enabling binary convolutions within the SRAM array. In general, binary convolutions consist of bit-wise XNOR followed by a population-count (popcount). We present a charge sharing XNOR and popcount operation in 10 transistor SRAM cells. We have employed multiple circuit techniques including dual-read-worldines (Dual-RWL) along with a dual-stage ADC that overcomes the inaccuracies of a low precision ADC, to achieve a fairly accurate popcount. In addition, a key highlight of the present work is the fact that we propose sectioning of the SRAM array by adding switches onto the read-bitlines, thereby achieving improved parallelism. This is beneficial for deep networks, where the kernel size grows and requires to be stored in multiple sub-banks. As such, one needs to evaluate the partial popcount from multiple sub-banks and sum them up for achieving the final popcount. For n-sections per sub-array, we can perform n convolutions within one particular sub-bank, thereby improving overall system throughput as well as the energy efficiency. Our results at the array level show that the energy consumption and delay per-operation was 1.914pJ and 45ns, respectively. Moreover, an energy improvement of 2.5x, and a performance improvement of 4x was achieved by using the proposed sectioned-SRAM, compared to a non-sectioned SRAM design.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا