أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Soroush Bateni

Quantifying and Generalizing the CAP Theorem

169 - Edward A. Lee , Soroush Bateni , Shaokai Lin 2021

In distributed applications, Brewers CAP theorem tells us that when networks become partitioned, there is a tradeoff between consistency and availability. Consistency is agreement on the values of shared variables across a system, and availability is the ability to respond to reads and writes accessing those shared variables. We quantify these concepts, giving numerical values to inconsistency and unavailability. Recognizing that network partitioning is not an all-or-nothing proposition, we replace the P in CAP with L, a numerical measure of apparent latency, and derive the CAL theorem, an algebraic relation between inconsistency, unavailability, and apparent latency. This relation shows that if latency becomes unbounded (e.g., the network becomes partitioned), then one of inconsistency and unavailability must also become unbounded, and hence the CAP theorem is a special case of the CAL theorem. We describe two distributed coordination mechanisms, which we have implemented as an extension of the Lingua Franca coordination language, that support arbitrary tradeoffs between consistency and availability as apparent latency varies. With centralized coordination, inconsistency remains bounded by a chosen numerical value at the cost that unavailability becomes unbounded under network partitioning. With decentralized coordination, unavailability remains bounded by a chosen numerical quantity at the cost that inconsistency becomes unbounded under network partitioning. Our centralized coordination mechanism is an extension of techniques that have historically been used for distributed simulation, an application where consistency is paramount. Our decentralized coordination mechanism is an extension of techniques that have been used in distributed databases when availability is paramount.

النظم الموزعة والتوازية والحوسبة العنقودية

LINTS^RT: A Learning-driven Testbed for Intelligent Scheduling in Embedded Systems

71 - Zelun Kong , Yaswanth Yadlapalli , Soroush Bateni 2020

Due to the increasing complexity seen in both workloads and hardware resources in state-of-the-art embedded systems, developing efficient real-time schedulers and the corresponding schedulability tests becomes rather challenging. Although close to op timal schedulability performance can be achieved for supporting simple system models in practice, adding any small complexity element into the problem context such as non-preemption or resource heterogeneity would cause significant pessimism, which may not be eliminated by any existing scheduling technique. In this paper, we present LINTS^RT, a learning-based testbed for intelligent real-time scheduling, which has the potential to handle various complexities seen in practice. The design of LINTS^RT is fundamentally motivated by AlphaGo Zero for playing the board game Go, and specifically addresses several critical challenges due to the real-time scheduling context. We first present a clean design of LINTS^RT for supporting the basic case: scheduling sporadic workloads on a homogeneous multiprocessor, and then demonstrate how to easily extend the framework to handle further complexities such as non-preemption and resource heterogeneity. Both application and OS-level implementation and evaluation demonstrate that LINTS^RT is able to achieve significantly higher runtime schedulability under different settings compared to perhaps the most commonly applied schedulers, global EDF, and RM. To our knowledge, this work is the first attempt to design and implement an extensible learning-based testbed for autonomously making real-time scheduling decisions.

أنظمة التشغيل

Co-Optimizing Performance and Memory FootprintVia Integrated CPU/GPU Memory Management, anImplementation on Autonomous Driving Platform

219 - Soroush Bateni , Zhendong Wang , Yuankun Zhu 2020

Cutting-edge embedded system applications, such as self-driving cars and unmanned drone software, are reliant on integrated CPU/GPU platforms for their DNNs-driven workload, such as perception and other highly parallel components. In this work, we se t out to explore the hidden performance implication of GPU memory management methods of integrated CPU/GPU architecture. Through a series of experiments on micro-benchmarks and real-world workloads, we find that the performance under different memory management methods may vary according to application characteristics. Based on this observation, we develop a performance model that can predict system overhead for each memory management method based on application characteristics. Guided by the performance model, we further propose a runtime scheduler. By conducting per-task memory management policy switching and kernel overlapping, the scheduler can significantly relieve the system memory pressure and reduce the multitasking co-run response time. We have implemented and extensively evaluated our system prototype on the NVIDIA Jetson TX2, Drive PX2, and Xavier AGX platforms, using both Rodinia benchmark suite and two real-world case studies of drone software and autonomous driving software.

النظم الموزعة والتوازية والحوسبة العنقودية أنظمة التشغيل

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد