ترغب بنشر مسار تعليمي؟ اضغط هنا

Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation

289   0   0.0 ( 0 )
 نشر من قبل Olga Poppe
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Microsoft Azure is dedicated to guarantee high quality of service to its customers, in particular, during periods of high customer activity, while controlling cost. We employ a Data Science (DS) driven solution to predict user load and leverage these predictions to optimize resource allocation. To this end, we built the Seagull infrastructure that processes per-server telemetry, validates the data, trains and deploys ML models. The models are used to predict customer load per server (24h into the future), and optimize service operations. Seagull continually re-evaluates accuracy of predictions, fallback to previously known good models and triggers alerts as appropriate. We deployed this infrastructure in production for PostgreSQL and MySQL servers across all Azure regions, and applied it to the problem of scheduling server backups during low-load time. This minimizes interference with user-induced load and improves customer experience.

قيم البحث

اقرأ أيضاً

Developing modern systems software is a complex task that combines business logic programming and Software Performance Engineering (SPE). The later is an experimental and labor-intensive activity focused on optimizing the system for a given hardware, software, and workload (hw/sw/wl) context. Todays SPE is performed during build/release phases by specialized teams, and cursed by: 1) lack of standardized and automated tools, 2) significant repeated work as hw/sw/wl context changes, 3) fragility induced by a one-size-fit-all tuning (where improvements on one workload or component may impact others). The net result: despite costly investments, system software is often outside its optimal operating point - anecdotally leaving 30% to 40% of performance on the table. The recent developments in Data Science (DS) hints at an opportunity: combining DS tooling and methodologies with a new developer experience to transform the practice of SPE. In this paper we present: MLOS, an ML-powered infrastructure and methodology to democratize and automate Software Performance Engineering. MLOS enables continuous, instance-level, robust, and trackable systems optimization. MLOS is being developed and employed within Microsoft to optimize SQL Server performance. Early results indicated that component-level optimizations can lead to 20%-90% improvements when custom-tuning for a specific hw/sw/wl, hinting at a significant opportunity. However, several research challenges remain that will require community involvement. To this end, we are in the process of open-sourcing the MLOS core infrastructure, and we are engaging with academic institutions to create an educational program around Software 2.0 and MLOS ideas.
240 - Yujie Wamg , Xin Du , Xuzhao Chen 2021
Artificial intelligence is one of the important technologies for industrial applications, but it requires a lot of computing resources and sensor data to support it. With the development of edge computing and the Internet of Things, artificial intell igence are playing an increasingly important role in the field of edge services. Therefore, how to make intelligent algorithms provide better services and the development of the Internet of Things has become an increasingly important topic. This paper focuses on the application of edge service distribution strategy, and proposes an edge service distribution strategy based on intelligent prediction, which reduces the bandwidth consumption of edge service providers and minimizes the cost of edge service providers. In addition, this article uses the real data provided by the Wangsu Technology Company and an improved long and short term memory prediction method to dynamically change the bandwidth, and achieves better optimization of resources allocation comparing with actual industrial applications.The simulation results show that our intelligent prediction can achieve good results, and the mechanism can achieve higher resource utilization.
We document the data transfer workflow, data transfer performance, and other aspects of staging approximately 56 terabytes of climate model output data from the distributed Coupled Model Intercomparison Project (CMIP5) archive to the National Energy Research Supercomputing Center (NERSC) at the Lawrence Berkeley National Laboratory required for tracking and characterizing extratropical storms, a phenomena of importance in the mid-latitudes. We present this analysis to illustrate the current challenges in assembling multi-model data sets at major computing facilities for large-scale studies of CMIP5 data. Because of the larger archive size of the upcoming CMIP6 phase of model intercomparison, we expect such data transfers to become of increasing importance, and perhaps of routine necessity. We find that data transfer rates using the ESGF are often slower than what is typically available to US residences and that there is significant room for improvement in the data transfer capabilities of the ESGF portal and data centers both in terms of workflow mechanics and in data transfer performance. We believe performance improvements of at least an order of magnitude are within technical reach using current best practices, as illustrated by the performance we achieved in transferring the complete raw data set between two high performance computing facilities. To achieve these performance improvements, we recommend: that current best practices (such as the Science DMZ model) be applied to the data servers and networks at ESGF data centers; that sufficient financial and human resources be devoted at the ESGF data centers for systems and network engineering tasks to support high performance data movement; and that performance metrics for data transfer between ESGF data centers and major computing facilities used for climate data analysis be established, regularly tested, and published.
Critical infrastructure protection (CIP) is envisioned to be one of the most challenging security problems in the coming decade. One key challenge in CIP is the ability to allocate resources, either personnel or cyber, to critical infrastructures wit h different vulnerability and criticality levels. In this work, a contract-theoretic approach is proposed to solve the problem of resource allocation in critical infrastructure with asymmetric information. A control center (CC) is used to design contracts and offer them to infrastructures owners. A contract can be seen as an agreement between the CC and infrastructures using which the CC allocates resources and gets rewards in return. Contracts are designed in a way to maximize the CCs benefit and motivate each infrastructure to accept a contract and obtain proper resources for its protection. Infrastructures are defined by both vulnerability levels and criticality levels which are unknown to the CC. Therefore, each infrastructure can claim that it is the most vulnerable or critical to gain more resources. A novel mechanism is developed to handle such an asymmetric information while providing the optimal contract that motivates each infrastructure to reveal its actual type. The necessary and sufficient conditions for such resource allocation contracts under asymmetric information are derived. Simulation results show that the proposed contract-theoretic approach maximizes the CCs utility while ensuring that no infrastructure has an incentive to ask for another contract, despite the lack of exact information at the CC.
166 - Yifu Yang , Gang Wu , Weidang Lu 2020
A Load Balancing Relay Algorithm (LBRA) was proposed to solve the unfair spectrum resource allocation in the traditional mobile MTC relay. In order to obtain reasonable use of spectrum resources, and a balanced MTC devices (MTCDs) distribution, spect rum resources are dynamically allocated by MTCDs regrouped on the MTCD to MTC gateway link. Moreover, the system outage probability and transmission capacity are derived when using LBRA. The numerical results show that the proposed algorithm has better performance in transmission capacity and outage probability than the traditional method. LBRA had an increase in transmission capacity of about 0.7dB, and an improvement in outage probability of about 0.8dB with a high MTCD density.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا