No Arabic abstract
We present ESP4ML, an open-source system-level design flow to build and program SoC architectures for embedded applications that require the hardware acceleration of machine learning and signal processing algorithms. We realized ESP4ML by combining two established open-source projects (ESP and HLS4ML) into a new, fully-automated design flow. For the SoC integration of accelerators generated by HLS4ML, we designed a set of new parameterized interface circuits synthesizable with high-level synthesis. For accelerator configuration and management, we developed an embedded software runtime system on top of Linux. With this HW/SW layer, we addressed the challenge of dynamically shaping the data traffic on a network-on-chip to activate and support the reconfigurable pipelines of accelerators that are needed by the application workloads currently running on the SoC. We demonstrate our vertically-integrated contributions with the FPGA-based implementations of complete SoC instances booting Linux and executing computer-vision applications that process images taken from the Google Street View database.
Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100 -- 1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.
Businesses, particularly small and medium-sized enterprises, aiming to start up in Model-Based Design (MBD) face difficult choices from a wide range of methods, notations and tools before making the significant investments in planning, procurement and training necessary to deploy new approaches successfully. In the development of Cyber-Physical Systems (CPSs) this is exacerbated by the diversity of formalisms covering computation, physical and human processes. In this paper, we propose the use of a cloud-enabled and open collaboration platform that allows businesses to offer models, tools and other assets, and permits others to access these on a pay-per-use basis as a means of lowering barriers to the adoption of MBD technology, and to promote experimentation in a sandbox environment.
Triple Modular Redundancy (TMR) is a suitable fault tolerant technique for SRAM-based FPGA. However, one of the main challenges in achieving 100% robustness in designs protected by TMR running on programmable platforms is to prevent upsets in the routing from provoking undesirable connections between signals from distinct redundant logic parts, which can generate an error in the output. This paper investigates the optimal design of the TMR logic (e.g., by cleverly inserting voters) to ensure robustness. Four differe
Manycore System-on-Chip include an increasing amount of processing elements and have become an important research topic for improvements of both hardware and software. While research can be conducted using system simulators, prototyping requires a variety of components and is very time consuming. With the Open Tiled Manycore System-on-Chip (OpTiMSoC) we aim at building such an environment for use in our and other research projects as prototyping platform. This paper describes the project goals and aspects of OpTiMSoC and summarizes the current status and ideas.
The Boltzmann Machine (BM) is a neural network composed of stochastically firing neurons that can learn complex probability distributions by adapting the synaptic interactions between the neurons. BMs represent a very generic class of stochastic neural networks that can be used for data clustering, generative modelling and deep learning. A key drawback of software-based stochastic neural networks is the required Monte Carlo sampling, which scales intractably with the number of neurons. Here, we realize a physical implementation of a BM directly in the stochastic spin dynamics of a gated ensemble of coupled cobalt atoms on the surface of semiconducting black phosphorus. Implementing the concept of orbital memory utilizing scanning tunnelling microscopy, we demonstrate the bottom-up construction of atomic ensembles whose stochastic current noise is defined by a reconfigurable multi-well energy landscape. Exploiting the anisotropic behaviour of black phosphorus, we build ensembles of atoms with two well-separated intrinsic time scales that represent neurons and synapses. By characterizing the conditional steady-state distribution of the neurons for given synaptic configurations, we illustrate that an ensemble can represent many distinct probability distributions. By probing the intrinsic synaptic dynamics, we reveal an autonomous reorganization of the synapses in response to external electrical stimuli. This self-adaptive architecture paves the way for on-chip learning directly in atomic-scale machine learning hardware.