No Arabic abstract
Maximizing the performance potential of the modern day GPU architecture requires judicious utilization of available parallel resources. Although dramatic reductions can often be obtained through straightforward mappings, further performance improvements often require algorithmic redesigns to more closely exploit the target architecture. In this paper, we focus on efficient molecular simulations for the GPU and propose a novel cell list algorithm that better utilizes its parallel resources. Our goal is an efficient GPU implementation of large-scale Monte Carlo simulations for the grand canonical ensemble. This is a particularly challenging application because there is inherently less computation and parallelism than in similar applications with molecular dynamics. Consistent with the results of prior researchers, our simulation results show traditional cell list implementations for Monte Carlo simulations of molecular systems offer effectively no performance improvement for small systems [5, 14], even when porting to the GPU. However for larger systems, the cell list implementation offers significant gains in performance. Furthermore, our novel cell list approach results in better performance for all problem sizes when compared with other GPU implementations with or without cell lists.
As the emerging trend of graph-based deep learning, Graph Neural Networks (GNNs) excel for their capability to generate high-quality node feature vectors (embeddings). However, the existing one-size-fits-all GNN implementations are insufficient to catch up with the evolving GNN architectures, the ever-increasing graph sizes, and the diverse node embedding dimensionalities. To this end, we propose textbf{GNNAdvisor}, an adaptive and efficient runtime system to accelerate various GNN workloads on GPU platforms. First, GNNAdvisor explores and identifies several performance-relevant features from both the GNN model and the input graph, and uses them as a new driving force for GNN acceleration. Second, GNNAdvisor implements a novel and highly-efficient 2D workload management, tailored for GNN computation to improve GPU utilization and performance under different application settings. Third, GNNAdvisor capitalizes on the GPU memory hierarchy for acceleration by gracefully coordinating the execution of GNNs according to the characteristics of the GPU memory structure and GNN workloads. Furthermore, to enable automatic runtime optimization, GNNAdvisor incorporates a lightweight analytical model for an effective design parameter search. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library ($3.02times$ faster on average) and NeuGraph (up to $4.10times$ faster), on mainstream GNN architectures across various datasets.
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language (OpenCL) framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strat- egies are developed to obtain efficient simulations using multiple central processing units (CPUs) and GPUs.
LDA is a statistical approach for topic modeling with a wide range of applications. However, there exist very few attempts to accelerate LDA on GPUs which come with exceptional computing and memory throughput capabilities. To this end, we introduce EZLDA which achieves efficient and scalable LDA training on GPUs with the following three contributions: First, EZLDA introduces three-branch sampling method which takes advantage of the convergence heterogeneity of various tokens to reduce the redundant sampling task. Second, to enable sparsity-aware format for both D and W on GPUs with fast sampling and updating, we introduce hybrid format for W along with corresponding token partition to T and inverted index designs. Third, we design a hierarchical workload balancing solution to address the extremely skewed workload imbalance problem on GPU and scaleEZLDA across multiple GPUs. Taken together, EZLDA achieves superior performance over the state-of-the-art attempts with lower memory consumption.
Monte Carlo simulations are widely used in many areas including particle accelerators. In this lecture, after a short introduction and reviewing of some statistical backgrounds, we will discuss methods such as direct inversion, rejection method, and Markov chain Monte Carlo to sample a probability distribution function, and methods for variance reduction to evaluate numerical integrals using the Monte Carlo simulation. We will also briefly introduce the quasi-Monte Carlo sampling at the end of this lecture.
We have developed a Monte Carlo simulation for ion transport in hot background gases, which is an alternative way of solving the corresponding Boltzmann equation that determines the distribution function of ions. We consider the limit of low ion densities when the distribution function of the background gas remains unchanged due to collision with ions. A special attention has been paid to properly treat the thermal motion of the host gas particles and their influence on ions, which is very important at low electric fields, when the mean ion energy is comparable to the thermal energy of the host gas. We found the conditional probability distribution of gas velocities that correspond to an ion of specific velocity which collides with a gas particle. Also, we have derived exact analytical formulas for piecewise calculation of the collision frequency integrals. We address the cases when the background gas is monocomponent and when it is a mixture of different gases. The developed techniques described here are required for Monte Carlo simulations of ion transport and for hybrid models of non-equilibrium plasmas. The range of energies where it is necessary to apply the technique has been defined. The results we obtained are in excellent agreement with the existing ones obtained by complementary methods. Having verified our algorithm, we were able to produce calculations for Ar$^+$ ions in Ar and propose them as a new benchmark for thermal effects. The developed method is widely applicable for solving the Boltzmann equation that appears in many different contexts in physics.