ﻻ يوجد ملخص باللغة العربية
Due to a hard dependency between time steps, large-scale simulations of gas using the Direct Simulation Monte Carlo (DSMC) method proceed at the pace of the slowest processor. Scalability is therefore achievable only by ensuring that the work done each time step is as evenly apportioned among the processors as possible. Furthermore, as the simulated system evolves, the load shifts, and thus this load-balancing typically needs to be performed multiple times over the course of a simulation. Common methods generally use either crude performance models or processor-level timers. We combine both to create a timer-augmented cost function which both converges quickly and yields well-balanced processor decompositions. When compared to a particle-based performance model alone, our method achieves 2x speedup at steady-state on up to 1024 processors for a test case consisting of a Mach 9 argon jet impacting a solid wall.
Fat-tree networks have been widely adopted to High Performance Computing (HPC) clusters and to Data Center Networks (DCN). These parallel systems usually have a large number of servers and hosts, which generate large volumes of highly-volatile traffi
We present a novel framework, called balanced overlay networks (BON), that provides scalable, decentralized load balancing for distributed computing using large-scale pools of heterogeneous computers. Fundamentally, BON encodes the information about
We introduced the load-balanced routing algorithms, for interconnection networks resulting from nesting, by considering the pressure of the data forwarding in each node. Benchmarks on a small cluster with various network topologies, and simulations f
Accurate load prediction is an effective way to reduce power system operation costs. Traditionally, the mean square error (MSE) is a common-used loss function to guide the training of an accurate load forecasting model. However, the MSE loss function
The Load-Balanced Router architecture has received a lot of attention because it does not require centralized scheduling at the internal switch fabrics. In this paper we reexamine the architecture, motivated by its potential to turn off multiple comp