أوراق بحثية, رسائل ماجستير ودكتوراه حول الأداء

Covert queueing problem with a Markovian statistic

90 - Arti Yardi , Tejas Bodas 2021

Based on the covert communication framework, we consider a covert queueing problem that has a Markovian statistic. Willie jobs arrive according to a Poisson process and require service from server Bob. Bob does not have a queue for jobs to wait and h ence when the server is busy, arriving Willie jobs are lost. Willie and Bob enter a contract under which Bob should only serve Willie jobs. As part of the usage statistic, for a sequence of N consecutive jobs that arrived, Bob informs Willie whether each job was served or lost (this is the Markovian statistic). Bob is assumed to be violating the contract and admitting non-Willie (Nillie) jobs according to a Poisson process. For such a setting, we identify the hypothesis testing to be performed (given the Markovian data) by Willie to detect the presence or absence of Nillie jobs. We also characterize the upper bound on arrival rate of Nillie jobs such that the error in the hypothesis testing of Willie is arbitrarily large, ensuring covertness in admitting Nillie jobs.

نظرية المعلومات الأداء نظرية المعلومات

IFogSim2: An Extended iFogSim Simulator for Mobility, Clustering, and Microservice Management in Edge and Fog Computing Environments

132 - Redowan Mahmud , Samodha Pallewatta , Mohammad Goudarzi 2021

Internet of Things (IoT) has already proven to be the building block for next-generation Cyber-Physical Systems (CPSs). The considerable amount of data generated by the IoT devices needs latency-sensitive processing, which is not feasible by deployin g the respective applications in remote Cloud datacentres. Edge/Fog computing, a promising extension of Cloud at the IoT-proximate network, can meet such requirements for smart CPSs. However, the structural and operational differences of Edge/Fog infrastructure resist employing Cloud-based service regulations directly to these environments. As a result, many research works have been recently conducted, focusing on efficient application and resource management in Edge/Fog computing environments. Scalable Edge/Fog infrastructure is a must to validate these policies, which is also challenging to accommodate in the real-world due to high cost and implementation time. Considering simulation as a key to this constraint, various software has been developed that can imitate the physical behaviour of Edge/Fog computing environments. Nevertheless, the existing simulators often fail to support advanced service management features because of their monolithic architecture, lack of actual dataset, and limited scope for a periodic update. To overcome these issues, we have developed multiple simulation models for service migration, dynamic distributed cluster formation, and microservice orchestration for Edge/Fog computing in this work and integrated with the existing iFogSim simulation toolkit for launching it as iFogSim2. The performance of iFogSim2 and its built-in policies are evaluated using three use case scenarios and compared with the contemporary simulators and benchmark policies under different settings. Results indicate that the proposed solution outperform others in service management time, network usage, ram consumption, and simulation time.

النظم الموزعة والتوازية والحوسبة العنقودية الأداء هندسة البرمجيات

Sharp Waiting-Time Bounds for Multiserver Jobs

91 - Yige Hong , Weina Wang 2021

Multiserver jobs, which are jobs that occupy multiple servers simultaneously during service, are prevalent in todays computing clusters. But little is known about the delay performance of systems with multiserver jobs. We consider queueing models for multiserver jobs in a scaling regime where the total number of servers in the system becomes large and meanwhile both the system load and the number of servers that a job needs scale with the total number of servers. Prior work has derived upper bounds on the queueing probability in this scaling regime. However, without proper lower bounds, the existing results cannot be used to differentiate between policies. In this paper, we study the delay performance by establishing sharp bounds on the mean waiting time of multiserver jobs, where the waiting time of a job is the time spent in queueing rather than in service. We first consider the commonly used First-Come-First-Serve (FCFS) policy and characterize the exact order of its mean waiting time. We then prove a lower bound on the mean waiting time of all policies, and demonstrate that there is an order gap between this lower bound and the mean waiting time under FCFS. We finally complement the lower bound with an achievability result: we show that under a priority policy that we call P-Priority, the mean waiting time achieves the order of the lower bound. This achievability result implies the tightness of the lower bound, the asymptotic optimality of P-Priority, and the strict suboptimality of FCFS.

الأداء الاحتمالات

An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis

97 - Hsin-Yu Ho , Ren-Song Tsay 2021

In this paper, we proposed an effective and efficient multi-core shared-cache design optimization approach based on reuse-distance analysis of the data traces of target applications. Since data traces are independent of system hardware architectures, a designer can easily compute the best cache design at the early system design phase using our approach. We devise a very efficient and yet accurate method to derive the aggregated reuse-distance histograms of concurrent applications for accurate cache performance analysis and optimization. Essentially, the actual shared-cache contention results of concurrent applications are embedded in the aggregated reuse-distance histograms and therefore the approach is very effective. The experimental results show that the average error rate of shared-cache miss-count estimations of our approach is less than 2.4%. Using a simple scanning search method, one can easily determine the true optimal cache configurations at the early system design phase.

الأداء

A Precise Program Phase Identification Method Based on Frequency Domain Analysis

99 - Hsuan-Yi Lin , Ren-Song Tsay 2021

In this paper, we present a systematic approach that transforms the program execution trace into the frequency domain and precisely identifies program phases. The analyzed results can be embedded into program code to mark the starting point and execu tion characteristics, such as CPI (Cycles per Instruction), of each phase. The so generated information can be applied to runtime program phase prediction. With the precise program phase information, more intelligent software and system optimization techniques can be further explored and developed.

هندسة البرمجيات الأداء

Analytical Process Scheduling Optimization for Heterogeneous Multi-core Systems

204 - Chien-Hao Chen , Ren-Song Tsay 2021

In this paper, we propose the first optimum process scheduling algorithm for an increasingly prevalent type of heterogeneous multicore (HEMC) system that combines high-performance big cores and energy-efficient small cores with the same instruction-s et architecture (ISA). Existing algorithms are all heuristics-based, and the well-known IPC-driven approach essentially tries to schedule high scaling factor processes on big cores. Our analysis shows that, for optimum solutions, it is also critical to consider placing long running processes on big cores. Tests of SPEC 2006 cases on various big-small core combinations show that our proposed optimum approach is up to 34% faster than the IPC-driven heuristic approach in terms of total workload completion time. The complexity of our algorithm is O(NlogN) where N is the number of processes. Therefore, the proposed optimum algorithm is practical for use.

النظم الموزعة والتوازية والحوسبة العنقودية أنظمة التشغيل الأداء

Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics

273 - Dewi Yokelson , Nikolay V. Tkachenko , Robert Robey 2021

Using a realistic molecular catalyst system, we conduct scaling studies of ab initio molecular dynamics simulations using the CP2K code on both Intel Xeon CPU and NVIDIA V100 GPU architectures. We explore using process placement and affinity to gain additional performance improvements. We also use statistical methods to understand performance changes in spite of the variability in runtime for each molecular dynamics timestep. We found ideal conditions for CPU runs included at least four MPI ranks per node, bound evenly across each socket, and fully utilizing processing cores with one OpenMP thread per core, no benefit was shown from reserving cores for the system. The CPU-only simulations scaled at 70% or more of the ideal scaling up to 10 compute nodes, after which the returns began to diminish more quickly. Simulations on a single 40-core node with two NVIDIA V100 GPUs for acceleration achieved over 3.7x speedup compared to the fastest single 36-core node CPU-only version, and showed 13% speedup over the fastest time we achieved across five CPU-only nodes.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

Automatic Timing-Coherent Transactor Generation for Mixed-level Simulations

131 - Li-Chun Chen , Hsin-I Wu , Ren-Song Tsay 2021

In this paper we extend the concept of the traditional transactor, which focuses on correct content transfer, to a new timing-coherent transactor that also accurately aligns the timing of each transaction boundary so that designers can perform precis e concurrent system behavior analysis in mixed-abstraction-level system simulations which are essential to increasingly complex system designs. To streamline the process, we also developed an automatic approach for timing-coherent transactor generation. Our approach is actually applied in mixed-level simulations and the results show that it achieves 100% timing accuracy while the conventional approach produces results of 25% to 44% error rate.

الأداء

$mathcal{N}$IPM-MPC: An Efficient Null-Space Method Based Interior-Point Method for Model Predictive Control

79 - Kai Pfeiffer , Ludovic Righetti 2021

Linear Model Predictive Control (MPC) is a widely used method to control systems with linear dynamics. Efficient interior-point methods have been proposed which leverage the block diagonal structure of the quadratic program (QP) resulting from the re ceding horizon control formulation. However, they require two matrix factorizations per interior-point method iteration, one each for the computation of the dual and the primal. Recently though an interior point method based on the null-space method has been proposed which requires only a single decomposition per iteration. While the then used null-space basis leads to dense null-space projections, in this work we propose a sparse null-space basis which preserves the block diagonal structure of the MPC matrices. Since it is based on the inverse of the transfer matrix we introduce the notion of so-called virtual controls which enables just that invertibility. A combination of the reduced number of factorizations and omission of the evaluation of the dual lets our solver outperform others in terms of computational speed by an increasing margin dependent on the number of state and control variables.

التحسين والتحكم التحليل العددي الأداء

Understanding Model Drift in a Large Cellular Network

104 - Shinan Liu , Francesco Bronzino , Paul Schmitt 2021

Operational networks are increasingly using machine learning models for a variety of tasks, including detecting anomalies, inferring application performance, and forecasting demand. Accurate models are important, yet accuracy can degrade over time du e to concept drift, whereby either the characteristics of the data change over time (data drift) or the relationship between the features and the target predictor change over time (model drift). Drift is important to detect because changes in properties of the underlying data or relationships to the target prediction can require model retraining, which can be time-consuming and expensive. Concept drift occurs in operational networks for a variety of reasons, ranging from software upgrades to seasonality to changes in user behavior. Yet, despite the prevalence of drift in networks, its extent and effects on prediction accuracy have not been extensively studied. This paper presents an initial exploration into concept drift in a large cellular network in the United States for a major metropolitan area in the context of demand forecasting. We find that concept drift arises largely due to data drift, and it appears across different key performance indicators (KPIs), models, training set sizes, and time intervals. We identify the sources of concept drift for the particular problem of forecasting downlink volume. Weekly and seasonal patterns introduce both high and low-frequency model drift, while disasters and upgrades result in sudden drift due to exogenous shocks. Regions with high population density, lower traffic volumes, and higher speeds also tend to correlate with more concept drift. The features that contribute most significantly to concept drift are User Equipment (UE) downlink packets, UE uplink packets, and Real-time Transport Protocol (RTP) total received packets.

بنية الشبكات والإنترنت التعلم الآلي الأداء

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد