Techreport: Time-sensitive probabilistic inference for the edge

103 0 0.0 ( 0 )

Download Cite

Added by Christian Weilbach

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Christian Weilbach - Annette Bieniusa

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In recent years the two trends of edge computing and artificial intelligence became both crucial for information processing infrastructures. While the centralized analysis of massive amounts of data seems to be at odds with computation on the outer edge of distributed systems, we explore the properties of eventually consistent systems and statistics to identify sound formalisms for probabilistic inference on the edge. In particular we treat time itself as a random variable that we incorporate into statistical models through probabilistic programming.

rate research

Tuning Algorithms and Generators for Efficient Edge Inference

246 - Rawan Naous , Lazar Supic , Yoonhwan Kang 2019

A surge in artificial intelligence and autonomous technologies have increased the demand toward enhanced edge-processing capabilities. Computational complexity and size of state-of-the-art Deep Neural Networks (DNNs) are rising exponentially with diverse network models and larger datasets. This growth limits the performance scaling and energy-efficiency of both distributed and embedded inference platforms. Embedded designs at the edge are constrained by energy and speed limitations of available processor substrates and processor to memory communication required to fetch the model coefficients. While many hardware accelerator and network deployment frameworks have been in development, a framework is needed to allow the variety of existing architectures, and those in development, to be expressed in critical parts of the flow that perform various optimization steps. Moreover, premature architecture-blind network selection and optimization diminish the effectiveness of schedule optimizations and hardware-specific mappings. In this paper, we address these issues by creating a cross-layer software-hardware design framework that encompasses network training and model compression that is aware of and tuned to the underlying hardware architecture. This approach leverages the available degrees of DNN structure and sparsity to create a converged network that can be partitioned and efficiently scheduled on the target hardware platform, minimizing data movement, and improving the overall throughput and energy. To further streamline the design, we leverage the high-level, flexible SoC generator platform based on RISC-V ROCC framework. This integration allows seamless extensions of the RISC-V instruction set and Chisel-based rapid generator design. Utilizing this approach, we implemented a silicon prototype in a 16 nm TSMC process node achieving record processing efficiency of up to 18 TOPS/W.

Distributed Parallel and Cluster Computing Artificial Intelligence

Hierarchical Hybrid Error Correction for Time-Sensitive Devices at the Edge

161 - Siyi Yang , Ahmed Hareedy , Robert Calderbank 2021

Computational storage, known as a solution to significantly reduce the latency by moving data-processing down to the data storage, has received wide attention because of its potential to accelerate data-driven devices at the edge. To meet the insatiable appetite for complicated functionalities tailored for intelligent devices such as autonomous vehicles, properties including heterogeneity, scalability, and flexibility are becoming increasingly important. Based on our prior work on hierarchical erasure coding that enables scalability and flexibility in cloud storage, we develop an efficient decoding algorithm that corrects a mixture of errors and erasures simultaneously. We first extract the basic component code, the so-called extended Cauchy (EC) codes, of the proposed coding solution. We prove that the class of EC codes is strictly larger than that of relevant codes with known explicit decoding algorithms. Motivated by this finding, we then develop an efficient decoding method for the general class of EC codes, based on which we propose the local and global decoding algorithms for the hierarchical codes. Our proposed hybrid error correction not only enables the usage of hierarchical codes in computational storage at the edge, but also applies to any Cauchy-like codes and allows potentially wider applications of the EC codes.

Information Theory Information Theory

PDMA: Probabilistic Service Migration Approach for Delay-aware and Mobility-aware Mobile Edge Computing

232 - Minxian Xu , Qiheng Zhou , Huaming Wu 2021

As a key technology in the 5G era, Mobile Edge Computing (MEC) has developed rapidly in recent years. MEC aims to reduce the service delay of mobile users, while alleviating the processing pressure on the core network. MEC can be regarded as an extension of cloud computing on the user side, which can deploy edge servers and bring computing resources closer to mobile users, and provide more efficient interactions. However, due to the users dynamic mobility, the distance between the user and the edge server will change dynamically, which may cause fluctuations in Quality of Service (QoS). Therefore, when a mobile user moves in the MEC environment, certain approaches are needed to schedule services deployed on the edge server to ensure the user experience. In this paper, we model service scheduling in MEC scenarios and propose a delay-aware and mobility-aware service management approach based on concise probabilistic methods. This approach has low computational complexity and can effectively reduce service delay and migration costs. Furthermore, we conduct experiments by utilizing multiple realistic datasets and use iFogSim to evaluate the performance of the algorithm. The results show that our proposed approach can optimize the performance on service delay, with 8% to 20% improvement and reduce the migration cost by more than 75% compared with baselines during the rush hours.

Distributed Parallel and Cluster Computing

Branchy-GNN: a Device-Edge Co-Inference Framework for Efficient Point Cloud Processing

274 - Jiawei Shao , Haowei Zhang , Yuyi Mao 2020

The recent advancements of three-dimensional (3D) data acquisition devices have spurred a new breed of applications that rely on point cloud data processing. However, processing a large volume of point cloud data brings a significant workload on resource-constrained mobile devices, prohibiting from unleashing their full potentials. Built upon the emerging paradigm of device-edge co-inference, where an edge device extracts and transmits the intermediate feature to an edge server for further processing, we propose Branchy-GNN for efficient graph neural network (GNN) based point cloud processing by leveraging edge computing platforms. In order to reduce the on-device computational cost, the Branchy-GNN adds branch networks for early exiting. Besides, it employs learning-based joint source-channel coding (JSCC) for the intermediate feature compression to reduce the communication overhead. Our experimental results demonstrate that the proposed Branchy-GNN secures a significant latency reduction compared with several benchmark methods.

Distributed Parallel and Cluster Computing

A Survey on Time-Sensitive Resource Allocation in the Cloud Continuum

84 - Saravanan Ramanathan 2020

Artificial Intelligence (AI) and Internet of Things (IoT) applications are rapidly growing in todays world where they are continuously connected to the internet and process, store and exchange information among the devices and the environment. The cloud and edge platform is very crucial to these applications due to their inherent compute-intensive and resource-constrained nature. One of the foremost challenges in cloud and edge resource allocation is the efficient management of computation and communication resources to meet the performance and latency guarantees of the applications. The heterogeneity of cloud resources (processors, memory, storage, bandwidth), variable cost structure and unpredictable workload patterns make the design of resource allocation techniques complex. Numerous research studies have been carried out to address this intricate problem. In this paper, the current state-of-the-art resource allocation techniques for the cloud continuum, in particular those that consider time-sensitive applications, are reviewed. Furthermore, we present the key challenges in the resource allocation problem for the cloud continuum, a taxonomy to classify the existing literature and the potential research gaps.

Distributed Parallel and Cluster Computing Networking and Internet Architecture