Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

75 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Haichen Shen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Haichen Shen - Jared Roesch - Zhi Chen

لغات البرمجة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assumptions which are violated by dynamic neural networks. Therefore, executing dynamic models with deep learning systems is currently both inflexible and sub-optimal, if not impossible. Optimizing dynamic neural networks is more challenging than static neural networks; optimizations must consider all possible execution paths and tensor shapes. This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Nimble handles model dynamism by introducing a dynamic type system, a set of dynamism-oriented optimizations, and a light-weight virtual machine runtime. Our evaluation demonstrates that Nimble outperforms state-of-the-art deep learning frameworks and runtime systems for dynamic neural networks by up to 20x on hardware platforms including Intel CPUs, ARM CPUs, and Nvidia GPUs.

قيم البحث

75 - Tian Jin , Gheorghe-Teodor Bercea , Tung D. Le 2020

Deep neural network models are becoming increasingly popular and have been used in various tasks such as computer vision, speech recognition, and natural language processing. Machine learning models are commonly trained in a resource-rich environment and then deployed in a distinct environment such as high availability machines or edge devices. To assist the portability of models, the open-source community has proposed the Open Neural Network Exchange (ONNX) standard. In this paper, we present a high-level, preliminary report on our onnx-mlir compiler, which generates code for the inference of deep neural network models described in the ONNX format. Onnx-mlir is an open-source compiler implemented using the Multi-Level Intermediate Representation (MLIR) infrastructure recently integrated in the LLVM project. Onnx-mlir relies on the MLIR concept of dialects to implement its functionality. We propose here two new dialects: (1) an ONNX specific dialect that encodes the ONNX standard semantics, and (2) a loop-based dialect to provide for a common lowering point for all ONNX dialect operations. Each intermediate representation facilitates its own characteristic set of graph-level and loop-based optimizations respectively. We illustrate our approach by following several models through the proposed representations and we include some early optimization work and performance results.

لغات البرمجة التعلم الآلي

LambdaNet: Probabilistic Type Inference using Graph Neural Networks

195 - Jiayi Wei , Maruth Goyal , Greg Durrett 2020

As gradual typing becomes increasingly popular in languages like Python and TypeScript, there is a growing need to infer type annotations automatically. While type annotations help with tasks like code completion and static error catching, these anno tations cannot be fully determined by compilers and are tedious to annotate by hand. This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network. Our approach first uses lightweight source code analysis to generate a program abstraction called a type dependency graph, which links type variables with logical constraints as well as name and usage information. Given this program abstraction, we then use a graph neural network to propagate information between related type variables and eventually make type predictions. Our neural architecture can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training. Our experimental results show that our approach outperforms prior work in this space by $14%$ (absolute) on library types, while having the ability to make type predictions that are out of scope for existing techniques.

لغات البرمجة التعلم الآلي التعلم الالي

S2DNAS:Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search

308 - Zhihang Yuan , Bingzhe Wu , Zheng Liang 2019

Recently, dynamic inference has emerged as a promising way to reduce the computational cost of deep convolutional neural network (CNN). In contrast to static methods (e.g. weight pruning), dynamic inference adaptively adjusts the inference process ac cording to each input sample, which can considerably reduce the computational cost on easy samples while maintaining the overall model performance. In this paper, we introduce a general framework, S2DNAS, which can transform various static CNN models to support dynamic inference via neural architecture search. To this end, based on a given CNN model, we first generate a CNN architecture space in which each architecture is a multi-stage CNN generated from the given model using some predefined transformations. Then, we propose a reinforcement learning based approach to automatically search for the optimal CNN architecture in the generated space. At last, with the searched multi-stage network, we can perform dynamic inference by adaptively choosing a stage to evaluate for each sample. Unlike previous works that introduce irregular computations or complex controllers in the inference or re-design a CNN model from scratch, our method can generalize to most of the popular CNN architectures and the searched dynamic network can be directly deployed using existing deep learning frameworks in various hardware devices.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

NADS-Net: A Nimble Architecture for Driver and Seat Belt Detection via Convolutional Neural Networks

84 - Sehyun Chun , Nima Hamidi Ghalehjegh , Joseph B. Choi 2019

A new convolutional neural network (CNN) architecture for 2D driver/passenger pose estimation and seat belt detection is proposed in this paper. The new architecture is more nimble and thus more suitable for in-vehicle monitoring tasks compared to ot her generic pose estimation algorithms. The new architecture, named NADS-Net, utilizes the feature pyramid network (FPN) backbone with multiple detection heads to achieve the optimal performance for driver/passenger state detection tasks. The new architecture is validated on a new data set containing video clips of 100 drivers in 50 driving sessions that are collected for this study. The detection performance is analyzed under different demographic, appearance, and illumination conditions. The results presented in this paper may provide meaningful insights for the autonomous driving research community and automotive industry for future algorithm development and data collection.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Dynamic Type Inference for Gradual Hindley--Milner Typing

86 - Yusuke Miyazaki , Taro Sekiyama , Atsushi Igarashi 2018

Garcia and Cimini study a type inference problem for the ITGL, an implicitly and gradually typed language with let-polymorphism, and develop a sound and complete inference algorithm for it. Soundness and completeness mean that, if the algorithm succe eds, the input term can be translated to a well-typed term of an explicitly typed blame calculus by cast insertion and vice versa. However, in general, there are many possible translations depending on how type variables that were left undecided by static type inference are instantiated with concrete static types. Worse, the translated terms may behave differently---some evaluate to values but others raise blame. In this paper, we propose and formalize a new blame calculus $lambda^{textsf{DTI}}_{textsf{B}}$ that avoids such divergence as an intermediate language for the ITGL. A main idea is to allow a term to contain type variables (that have not been instantiated during static type inference) and defer instantiation of these type variables to run time. We introduce dynamic type inference (DTI) into the semantics of $lambda^{textsf{DTI}}_{textsf{B}}$ so that type variables are instantiated along reduction. The DTI-based semantics not only avoids the divergence described above but also is sound and complete with respect to the semantics of fully instantiated terms in the following sense: if the evaluation of a term succeeds (i.e., terminates with a value) in the DTI-based semantics, then there is a fully instantiated version of the term that also succeeds in the explicitly typed blame calculus and vice versa. Finally, we prove the gradual guarantee, which is an important correctness criterion of a gradually typed language, for the ITGL.

لغات البرمجة المنطق في علوم الحاسوب