ﻻ يوجد ملخص باللغة العربية
The deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. However, the limited memory, computing resources, and power budget of the edge devices constrain the effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and implementations (e.g., accelerators) is challenging. In this paper, we summarize our recent efforts for efficient on-device AI development from three aspects, including both training and inference. First, we present on-device training with ultra-low memory usage. We propose a novel rank-adaptive tensor-based tensorized neural network model, which offers orders-of-magnitude memory reduction during training. Second, we introduce an ultra-low bitwidth quantization method for DNN model compression, achieving the state-of-the-art accuracy under the same compression ratio. Third, we introduce an ultra-low latency DNN accelerator design, practicing the software/hardware co-design methodology. This paper emphasizes the importance and efficacy of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.
Effective Capacity defines the maximum communication rate subject to a specific delay constraint, while effective energy efficiency (EEE) indicates the ratio between effective capacity and power consumption. We analyze the EEE of ultra-reliable netwo
Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent tra
Inspired by recent work on extended image volumes that lays the ground for randomized probing of extremely large seismic wavefield matrices, we present a memory frugal and computationally efficient inversion methodology that uses techniques from rand
Traditional link adaptation (LA) schemes in cellular network must be revised for networks beyond the fifth generation (b5G), to guarantee the strict latency and reliability requirements advocated by ultra reliable low latency communications (URLLC).
Considering a Manhattan mobility model in vehicle-to-vehicle networks, this work studies a power minimization problem subject to second-order statistical constraints on latency and reliability, captured by a network-wide maximal data queue length. We