ترغب بنشر مسار تعليمي؟ اضغط هنا

The ubiquitous use of IoT and machine learning applications is creating large amounts of data that require accurate and real-time processing. Although edge-based smart data processing can be enabled by deploying pretrained models, the energy and memo ry constraints of edge devices necessitate distributed deep learning between the edge and the cloud for complex data. In this paper, we propose a distributed AI system to exploit both the edge and the cloud for training and inference. We propose a new architecture, MEANet, with a main block, an extension block, and an adaptive block for the edge. The inference process can terminate at either the main block, the extension block, or the cloud. The MEANet is trained to categorize inputs into easy/hard/complex classes. The main block identifies instances of easy/hard classes and classifies easy classes with high confidence. Only data with high probabilities of belonging to hard classes would be sent to the extension block for prediction. Further, only if the neural network at the edge shows low confidence in the prediction, the instance is considered complex and sent to the cloud for further processing. The training technique lends to the majority of inference on edge devices while going to the cloud only for a small set of complex jobs, as determined by the edge. The performance of the proposed system is evaluated via extensive experiments using modified models of ResNets and MobileNetV2 on CIFAR-100 and ImageNet datasets. The results show that the proposed distributed model has improved accuracy and energy consumption, indicating its capacity to adapt.
In-Memory Computing (IMC) hardware using Memristive Crossbar Arrays (MCAs) are gaining popularity to accelerate Deep Neural Networks (DNNs) since it alleviates the memory wall problem associated with von-Neumann architecture. The hardware efficiency (energy, latency and area) as well as application accuracy (considering device and circuit non-idealities) of DNNs mapped to such hardware are co-dependent on network parameters, such as kernel size, depth etc. and hardware architecture parameters such as crossbar size. However, co-optimization of both network and hardware parameters presents a challenging search space comprising of different kernel sizes mapped to varying crossbar sizes. To that effect, we propose NAX -- an efficient neural architecture search engine that co-designs neural network and IMC based hardware architecture. NAX explores the aforementioned search space to determine kernel and corresponding crossbar sizes for each DNN layer to achieve optimal tradeoffs between hardware efficiency and application accuracy. Our results from NAX show that the networks have heterogeneous crossbar sizes across different network layers, and achieves optimal hardware efficiency and accuracy considering the non-idealities in crossbars. On CIFAR-10 and Tiny ImageNet, our models achieve 0.8%, 0.2% higher accuracy, and 17%, 4% lower EDAP (energy-delay-area product) compared to a baseline ResNet-20 and ResNet-18 models, respectively.
Memory effects are studied in the simplest scalar-tensor theory, the Brans--Dicke (BD) theory. To this end, we introduce, in BD theory, novel Kundt spacetimes (without and with gyratonic terms), which serve as backgrounds for the ensuing analysis on memory. The BD parameter $omega$ and the scalar field ($phi$) profile, expectedly, distinguishes between different solutions. Choosing specific localised forms for the free metric functions $H(u)$ (related to the wave profile) and $J(u)$ (the gyraton) we obtain displacement memory effects using both geodesics and geodesic deviation. An interesting and easy-to-understand exactly solvable case arises when $omega=-2$ (with $J(u)$ absent) which we discuss in detail. For other $omega$ (in the presence of $J$ or without), numerically obtained geodesics lead to results on displacement memory which appear to match qualitatively with those found from a deviation analysis. Thus, the issue of how memory effects in BD theory may arise and also differ from their GR counterparts, is now partially addressed, at least theoretically, within the context of this new class of Kundt geometries.
The increasing computational demand of Deep Learning has propelled research in special-purpose inference accelerators based on emerging non-volatile memory (NVM) technologies. Such NVM crossbars promise fast and energy-efficient in-situ Matrix Vector Multiplication (MVM) thus alleviating the long-standing von Neuman bottleneck in todays digital hardware. However, the analog nature of computing in these crossbars is inherently approximate and results in deviations from ideal output values, which reduces the overall performance of Deep Neural Networks (DNNs) under normal circumstances. In this paper, we study the impact of these non-idealities under adversarial circumstances. We show that the non-ideal behavior of analog computing lowers the effectiveness of adversarial attacks, in both Black-Box and White-Box attack scenarios. In a non-adaptive attack, where the attacker is unaware of the analog hardware, we observe that analog computing offers a varying degree of intrinsic robustness, with a peak adversarial accuracy improvement of 35.34%, 22.69%, and 9.90% for white box PGD (epsilon=1/255, iter=30) for CIFAR-10, CIFAR-100, and ImageNet respectively. We also demonstrate Hardware-in-Loop adaptive attacks that circumvent this robustness by utilizing the knowledge of the NVM model.
The pervasiveness of Internet-of-Things in our daily life has led to a recent surge in fog computing, encompassing a collaboration of cloud computing and edge intelligence. To that effect, deep learning has been a major driving force towards enabling such intelligent systems. However, growing model sizes in deep learning pose a significant challenge towards deployment in resource-constrained edge devices. Moreover, in a distributed intelligence environment, efficient workload distribution is necessary between edge and cloud systems. To address these challenges, we propose a conditionally deep hybrid neural network for enabling AI-based fog computing. The proposed network can be deployed in a distributed manner, consisting of quantized layers and early exits at the edge and full-precision layers on the cloud. During inference, if an early exit has high confidence in the classification results, it would allow samples to exit at the edge, and the deeper layers on the cloud are activated conditionally, which can lead to improved energy efficiency and inference latency. We perform an extensive design space exploration with the goal of minimizing energy consumption at the edge while achieving state-of-the-art classification accuracies on image classification tasks. We show that with binarized layers at the edge, the proposed conditional hybrid network can process 65% of inferences at the edge, leading to 5.5x computational energy reduction with minimal accuracy degradation on CIFAR-10 dataset. For the more complex dataset CIFAR-100, we observe that the proposed network with 4-bit quantization at the edge achieves 52% early classification at the edge with 4.8x energy reduction. The analysis gives us insights on designing efficient hybrid networks which achieve significantly higher energy efficiency than full-precision networks for edge-cloud based distributed intelligence systems.
Memory effects in the exact Kundt wave spacetimes are shown to arise in the behaviour of geodesics in such spacetimes. The types of Kundt spacetimes we consider here are direct products of the form $H^2times M(1,1)$ and $S^2times M(1,1)$. Both geomet ries have constant scalar curvature. We consider a scenario in which initial velocities of the transverse geodesic coordinates are set to zero (before the arrival of the pulse) in a spacetime with non-vanishing background curvature. We look for changes in the separation between pairs of geodesics caused by the pulse. Any relative change observed in the position and velocity profiles of geodesics, after the burst, can be solely attributed to the wave (hence, a memory effect). For constant negative curvature, we find there is permanent change in the separation of geodesics after the pulse has departed. Thus, there is displacement memory, though no velocity memory is found. In the case of constant positive scalar curvature (Plebanski-Hacyan spacetimes), we find both displacement and velocity memory along one direction. In the other direction, a new kind of memory (which we term as frequency memory effect) is observed where the separation between the geodesics shows periodic oscillations once the pulse has left. We also carry out similar analyses for spacetimes with a non-constant scalar curvature, which may be positive or negative. The results here seem to qualitatively agree with those for constant scalar curvature, thereby suggesting a link between the nature of memory and curvature.
The analog nature of computing in Memristive crossbars poses significant issues due to various non-idealities such as: parasitic resistances, non-linear I-V characteristics of the device etc. The non-idealities can have a detrimental impact on the fu nctionality i.e. computational accuracy of crossbars. Past works have explored modeling the non-idealities using analytical techniques. However, several non-idealities have data dependent behavior. This can not be captured using analytical (non data-dependent) models thereby, limiting their suitability in predicting application accuracy. To address this, we propose a Generalized Approach to Emulating Non-Ideality in Memristive Crossbars using Neural Networks (GENIEx), which accurately captures the data-dependent nature of non-idealities. We perform extensive HSPICE simulations of crossbars with different voltage and conductance combinations. Following that, we train a neural network to learn the transfer characteristics of the non-ideal crossbar. Next, we build a functional simulator which includes key architectural facets such as textit{tiling}, and textit{bit-slicing} to analyze the impact of non-idealities on the classification accuracy of large-scale neural networks. We show that GENIEx achieves textit{low} root mean square errors (RMSE) of $0.25$ and $0.7$ for low and high voltages, respectively, compared to HSPICE. Additionally, the GENIEx errors are $7times$ and $12.8times$ better than an analytical model which can only capture the linear non-idealities. Further, using the functional simulator and GENIEx, we demonstrate that an analytical model can overestimate the degradation in classification accuracy by $ge 10%$ on CIFAR-100 and $3.7%$ on ImageNet datasets compared to GENIEx.
The `Internet of Things has brought increased demand for AI-based edge computing in applications ranging from healthcare monitoring systems to autonomous vehicles. Quantization is a powerful tool to address the growing computational cost of such appl ications, and yields significant compression over full-precision networks. However, quantization can result in substantial loss of performance for complex image classification tasks. To address this, we propose a Principal Component Analysis (PCA) driven methodology to identify the important layers of a binary network, and design mixed-precision networks. The proposed Hybrid-Net achieves a more than 10% improvement in classification accuracy over binary networks such as XNOR-Net for ResNet and VGG architectures on CIFAR-100 and ImageNet datasets while still achieving up to 94% of the energy-efficiency of XNOR-Nets. This work furthers the feasibility of using highly compressed neural networks for energy-efficient neural computing in edge devices.
Adversarial examples are perturbed inputs that are designed (from a deep learning networks (DLN) parameter gradients) to mislead the DLN during test time. Intuitively, constraining the dimensionality of inputs or parameters of a network reduces the s pace in which adversarial examples exist. Guided by this intuition, we demonstrate that discretization greatly improves the robustness of DLNs against adversarial attacks. Specifically, discretizing the input space (or allowed pixel levels from 256 values or 8-bit to 4 values or 2-bit) extensively improves the adversarial robustness of DLNs for a substantial range of perturbations for minimal loss in test accuracy. Furthermore, we find that Binary Neural Networks (BNNs) and related variants are intrinsically more robust than their full precision counterparts in adversarial scenarios. Combining input discretization with BNNs furthers the robustness even waiving the need for adversarial training for certain magnitude of perturbation values. We evaluate the effect of discretization on MNIST, CIFAR10, CIFAR100 and Imagenet datasets. Across all datasets, we observe maximal adversarial resistance with 2-bit input discretization that incurs an adversarial accuracy loss of just ~1-2% as compared to clean test accuracy.
The recent advent of `Internet of Things (IOT) has increased the demand for enabling AI-based edge computing. This has necessitated the search for efficient implementations of neural networks in terms of both computations and storage. Although extrem e quantization has proven to be a powerful tool to achieve significant compression over full-precision networks, it can result in significant degradation in performance. In this work, we propose extremely quantized hybrid network architectures with both binary and full-precision sections to emulate the classification performance of full-precision networks while ensuring significant energy efficiency and memory compression. We explore several hybrid network architectures and analyze the performance of the networks in terms of accuracy, energy efficiency and memory compression. We perform our analysis on ResNet and VGG network architectures. Among the proposed network architectures, we show that the hybrid networks with full-precision residual connections emerge as the optimum by attaining accuracies close to full-precision networks while achieving excellent memory compression, up to 21.8x in case of VGG-19. This work demonstrates an effective way of hybridizing networks which achieve performance close to full-precision networks while attaining significant compression, furthering the feasibility of using such networks for energy-efficient neural computing in IOT-based edge devices.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا