No Arabic abstract
In this paper we present a system for monitoring and controlling dynamic network circuits inside the USLHCNet network. This distributed service system provides in near real-time complete topological information for all the circuits, resource allocation and usage, accounting, detects automatically failures in the links and network equipment, generate alarms and has the functionality to take automatic actions. The system is developed based on the MonALISA framework, which provides a robust monitoring and controlling service oriented architecture, with no single points of failure.
The Internet of Things (IoT) promises to help solve a wide range of issues that relate to our wellbeing within application domains that include smart cities, healthcare monitoring, and environmental monitoring. IoT is bringing new wireless sensor use cases by taking advantage of the computing power and flexibility provided by Edge and Cloud Computing. However, the software and hardware resources used within such applications must perform correctly and optimally. Especially in applications where a failure of resources can be critical. Service Level Agreements (SLA) where the performance requirements of such applications are defined, need to be specified in a standard way that reflects the end-to-end nature of IoT application domains, accounting for the Quality of Service (QoS) metrics within every layer including the Edge, Network Gateways, and Cloud. In this paper, we propose a conceptual model that captures the key entities of an SLA and their relationships, as a prior step for end-to-end SLA specification and composition. Service level objective (SLO) terms are also considered to express the QoS constraints. Moreover, we propose a new SLA grammar which considers workflow activities and the multi-layered nature of IoT applications. Accordingly, we develop a tool for SLA specification and composition that can be used as a template to generate SLAs in a machine-readable format. We demonstrate the effectiveness of the proposed specification language through a literature survey that includes an SLA language comparison analysis, and via reflecting the user satisfaction results of a usability study.
We design a dispatch system to improve the peak service quality of video on demand (VOD). Our system predicts the hot videos during the peak hours of the next day based on the historical requests, and dispatches to the content delivery networks (CDNs) at the previous off-peak time. In order to scale to billions of videos, we build the system with two neural networks, one for video clustering and the other for dispatch policy developing. The clustering network employs autoencoder layers and reduces the video number to a fixed value. The policy network employs fully connected layers and ranks the clustered videos with dispatch probabilities. The two networks are coupled with weight-sharing temporal layers, which analyze the video request sequences with convolutional and recurrent modules. Therefore, the clustering and dispatch tasks are trained in an end-to-end mechanism. The real-world results show that our approach achieves an average prediction accuracy of 17%, compared with 3% from the present baseline method, for the same amount of dispatches.
Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure. Thus, improving the execution efficiency of neural recommendation directly translates into infrastructure capacity saving. In this paper, we devise a novel end-to-end modeling infrastructure, DeepRecInfra, that adopts an algorithm and system co-design methodology to custom-design systems for recommendation use cases. Leveraging the insights from the recommendation characterization, a new dynamic scheduler, DeepRecSched, is proposed to maximize latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, recommendation model architectures, and underlying hardware systems. By doing so, system throughput is doubled across the eight industry-representative recommendation models. Finally, design, deployment, and evaluation in at-scale production datacenter shows over 30% latency reduction across a wide variety of recommendation models running on hundreds of machines.
Job submissions of parallel applications to production supercomputer systems will have to be carefully tuned in terms of the job submission parameters to obtain minimum response times. In this work, we have developed an end-to-end resource management framework that uses predictions of queue waiting and execution times to minimize response times of user jobs submitted to supercomputer systems. Our method for predicting queue waiting times adaptively chooses a prediction method based on the cluster structure of similar jobs. Our strategy for execution time predictions dynamically learns the impact of load on execution times and uses this to predict a set of execution time ranges for the target job. We have developed two resource management techniques that employ these predictions, one that selects the number of processors for execution and the other that also dynamically changes the job submission time. Using workload simulations of large supercomputer traces, we show large-scale improvements in predictions and reductions in response times over existing techniques and baseline strategies.
Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers. A major advancement in ER methodology has been the application of Bayesian generative models, which provide a natural framework for inferring latent entities with rigorous quantification of uncertainty. Despite these advantages, existing models are severely limited in practice, as standard inference algorithms scale quadratically in the number of records. While scaling can be managed by fitting the model on separate blocks of the data, such a naive approach may induce significant error in the posterior. In this paper, we propose a principled model for scalable Bayesian ER, called distributed Bayesian linkage or d-blink, which jointly performs blocking and ER without compromising posterior correctness. Our approach relies on several key ideas, including: (i) an auxiliary variable representation that induces a partition of the entities and records into blocks; (ii) a method for constructing well-balanced blocks based on k-d trees; (iii) a distributed partially-collapsed Gibbs sampler with improved mixing; and (iv) fast algorithms for performing Gibbs updates. Empirical studies on six data sets---including a case study on the 2010 Decennial Census---demonstrate the scalability and effectiveness of our approach.