A distributed service for on demand end to end optical circuits

455 0 0.0 ( 0 )

Download Cite

Added by Ciprian Dobre

Publication date 2011

fields Informatics Engineering

and research's language is English

Authors Ramiro Voicu - Iosif Legrand - Harvey Newman

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper we present a system for monitoring and controlling dynamic network circuits inside the USLHCNet network. This distributed service system provides in near real-time complete topological information for all the circuits, resource allocation and usage, accounting, detects automatically failures in the links and network equipment, generate alarms and has the functionality to take automatic actions. The system is developed based on the MonALISA framework, which provides a robust monitoring and controlling service oriented architecture, with no single points of failure.

rate research

End-to-End Service Level Agreement Specification for IoT Applications

182 - Awatif Alqahtani , Yinhao Li , Pankesh Patel 2018

The Internet of Things (IoT) promises to help solve a wide range of issues that relate to our wellbeing within application domains that include smart cities, healthcare monitoring, and environmental monitoring. IoT is bringing new wireless sensor use cases by taking advantage of the computing power and flexibility provided by Edge and Cloud Computing. However, the software and hardware resources used within such applications must perform correctly and optimally. Especially in applications where a failure of resources can be critical. Service Level Agreements (SLA) where the performance requirements of such applications are defined, need to be specified in a standard way that reflects the end-to-end nature of IoT application domains, accounting for the Quality of Service (QoS) metrics within every layer including the Edge, Network Gateways, and Cloud. In this paper, we propose a conceptual model that captures the key entities of an SLA and their relationships, as a prior step for end-to-end SLA specification and composition. Service level objective (SLO) terms are also considered to express the QoS constraints. Moreover, we propose a new SLA grammar which considers workflow activities and the multi-layered nature of IoT applications. Accordingly, we develop a tool for SLA specification and composition that can be used as a template to generate SLAs in a machine-readable format. We demonstrate the effectiveness of the proposed specification language through a literature survey that includes an SLA language comparison analysis, and via reflecting the user satisfaction results of a usability study.

Distributed Parallel and Cluster Computing

On-Demand Video Dispatch Networks: A Scalable End-to-End Learning Approach

139 - Damao Yang , Sihan Peng , He Huang 2018

We design a dispatch system to improve the peak service quality of video on demand (VOD). Our system predicts the hot videos during the peak hours of the next day based on the historical requests, and dispatches to the content delivery networks (CDNs) at the previous off-peak time. In order to scale to billions of videos, we build the system with two neural networks, one for video clustering and the other for dispatch policy developing. The clustering network employs autoencoder layers and reduces the video number to a fixed value. The policy network employs fully connected layers and ranks the clustered videos with dispatch probabilities. The two networks are coupled with weight-sharing temporal layers, which analyze the video request sequences with convolutional and recurrent modules. Therefore, the clustering and dispatch tasks are trained in an end-to-end mechanism. The real-world results show that our approach achieves an average prediction accuracy of 17%, compared with 3% from the present baseline method, for the same amount of dispatches.

Networking and Internet Architecture Machine Learning Systems and Control

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

91 - Udit Gupta , Samuel Hsia , Vikram Saraph 2020

Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure. Thus, improving the execution efficiency of neural recommendation directly translates into infrastructure capacity saving. In this paper, we devise a novel end-to-end modeling infrastructure, DeepRecInfra, that adopts an algorithm and system co-design methodology to custom-design systems for recommendation use cases. Leveraging the insights from the recommendation characterization, a new dynamic scheduler, DeepRecSched, is proposed to maximize latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, recommendation model architectures, and underlying hardware systems. By doing so, system throughput is doubled across the eight industry-representative recommendation models. Finally, design, deployment, and evaluation in at-scale production datacenter shows over 30% latency reduction across a wide variety of recommendation models running on hundreds of machines.

Distributed Parallel and Cluster Computing

End-to-End Predictions-Based Resource Management Framework for Supercomputer Jobs

119 - Swetha Hariharan , Prakash Murali , Abhishek Pasari 2020

Job submissions of parallel applications to production supercomputer systems will have to be carefully tuned in terms of the job submission parameters to obtain minimum response times. In this work, we have developed an end-to-end resource management framework that uses predictions of queue waiting and execution times to minimize response times of user jobs submitted to supercomputer systems. Our method for predicting queue waiting times adaptively chooses a prediction method based on the cluster structure of similar jobs. Our strategy for execution time predictions dynamically learns the impact of load on execution times and uses this to predict a set of execution time ranges for the target job. We have developed two resource management techniques that employ these predictions, one that selects the number of processors for execution and the other that also dynamically changes the job submission time. Using workload simulations of large supercomputer traces, we show large-scale improvements in predictions and reductions in response times over existing techniques and baseline strategies.

Distributed Parallel and Cluster Computing

d-blink: Distributed End-to-End Bayesian Entity Resolution

101 - Neil G. Marchant , Andee Kaplan , Daniel N. Elazar 2019

Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers. A major advancement in ER methodology has been the application of Bayesian generative models, which provide a natural framework for inferring latent entities with rigorous quantification of uncertainty. Despite these advantages, existing models are severely limited in practice, as standard inference algorithms scale quadratically in the number of records. While scaling can be managed by fitting the model on separate blocks of the data, such a naive approach may induce significant error in the posterior. In this paper, we propose a principled model for scalable Bayesian ER, called distributed Bayesian linkage or d-blink, which jointly performs blocking and ER without compromising posterior correctness. Our approach relies on several key ideas, including: (i) an auxiliary variable representation that induces a partition of the entities and records into blocks; (ii) a method for constructing well-balanced blocks based on k-d trees; (iii) a distributed partially-collapsed Gibbs sampler with improved mixing; and (iv) fast algorithms for performing Gibbs updates. Empirical studies on six data sets---including a case study on the 2010 Decennial Census---demonstrate the scalability and effectiveness of our approach.

Computation Databases Machine Learning