ترغب بنشر مسار تعليمي؟ اضغط هنا

Training on the Edge: The why and the how

80   0   0.0 ( 0 )
 نشر من قبل Navjot Kukreja
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Edge computing is the natural progression from Cloud computing, where, instead of collecting all data and processing it centrally, like in a cloud computing environment, we distribute the computing power and try to do as much processing as possible, close to the source of the data. There are various reasons this model is being adopted quickly, including privacy, and reduced power and bandwidth requirements on the Edge nodes. While it is common to see inference being done on Edge nodes today, it is much less common to do training on the Edge. The reasons for this range from computational limitations, to it not being advantageous in reducing communications between the Edge nodes. In this paper, we explore some scenarios where it is advantageous to do training on the Edge, as well as the use of checkpointing strategies to save memory.

قيم البحث

اقرأ أيضاً

Over the last decades quaternions have become a crucial and very successful tool for attitude representation in robotics and aerospace. However, there is a major problem that is continuously causing trouble in practice when it comes to exchanging for mulas or implementations: there are two quaternion multiplications in common use, Hamiltons original multiplication and its flipped version, which is often associated with NASAs Jet Propulsion Laboratory. We believe that this particular issue is completely avoidable and only exists today due to a lack of understanding. This paper explains the underlying problem for the popular passive world to body usage of rotation quaternions, and derives an alternative solution compatible with Hamiltons multiplication. Furthermore, it argues for entirely discontinuing the flipped multiplication. Additionally, it provides recipes for efficiently detecting relevant conventions and migrating formulas or algorithms between them.
The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the concurrent storage of high-precision activations for all layers, generally making learning on memory-constrained devices infeasible. In this paper, we demonstrate that the backward propagation operations needed for binary neural network training are strongly robust to quantization, thereby making on-the-edge learning with modern models a practical proposition. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint and energy reductions while inducing little to no accuracy loss vs Courbariaux & Bengios standard approach. These resource decreases are primarily enabled through the retention of activations exclusively in binary format. Against the latter algorithm, our drop-in replacement sees coincident memory requirement and energy consumption drops of 2--6$times$, while reaching similar test accuracy in comparable time, across a range of small-scale models trained to classify popular datasets. We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.12$times$ memory reduction. Such savings will allow for unnecessary cloud offloading to be avoided, reducing latency, increasing energy efficiency and safeguarding privacy.
Edge Computing exploits computational capabilities deployed at the very edge of the network to support applications with low latency requirements. Such capabilities can reside in small embedded devices that integrate dedicated hardware -- e.g., a GPU -- in a low cost package. But these devices have limited computing capabilities compared to standard server grade equipment. When deploying an Edge Computing based application, understanding whether the available hardware can meet target requirements is key in meeting the expected performance. In this paper, we study the feasibility of deploying Augmented Reality applications using Embedded Edge Devices (EEDs). We compare such deployment approach to one exploiting a standard dedicated server grade machine. Starting from an empirical evaluation of the capabilities of these devices, we propose a simple theoretical model to compare the performance of the two approaches. We then validate such model with NS-3 simulations and study their feasibility. Our results show that there is no one-fits-all solution. If we need to deploy high responsiveness applications, we need a centralized server grade architecture and we can in any case only support very few users. The centralized architecture fails to serve a larger number of users, even when low to mid responsiveness is required. In this case, we need to resort instead to a distributed deployment based on EEDs.
This paper describes how to augment techniques such as Distributed Shared Memory with recent trends on disaggregated Non Volatile Memory in the data centre so that the combination can be used in an edge environment with potentially volatile and mobil e resources. This article identifies the main advantages and challenges, and offers an architectural evolution to incorporate recent research trends into production-ready disaggregated edges. We also present two prototypes showing the feasibility of this proposal.
Distributed digital infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex applications to be executed from IoT Edge devices to the HPC Cloud (aka the Computing Continuum, the Digital Conti nuum, or the Transcontinuum). Understanding end-to-end performance in such a complex continuum is challenging. This breaks down to reconciling many, typically contradicting application requirements and constraints with low-level infrastructure design choices. One important challenge is to accurately reproduce relevant behaviors of a given application workflow and representative settings of the physical infrastructure underlying this complex continuum. We introduce a rigorous methodology for such a process and validate it through E2Clab. It is the first platform to support the complete experimental cycle across the Computing Continuum: deployment, analysis, optimization. Preliminary results with real-life use cases show that E2Clab allows one to understand and improve performance, by correlating it to the parameter settings, the resource usage and the specifics of the underlying infrastructure.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا