ترغب بنشر مسار تعليمي؟ اضغط هنا

Large Scale GPU Accelerated PPMLR-MHD Simulations for Space Weather Forecast

465   0   0.0 ( 0 )
 نشر من قبل Zhihui Du
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

PPMLR-MHD is a new magnetohydrodynamics (MHD) model used to simulate the interactions of the solar wind with the magnetosphere, which has been proved to be the key element of the space weather cause-and-effect chain process from the Sun to Earth. Compared to existing MHD methods, PPMLR-MHD achieves the advantage of high order spatial accuracy and low numerical dissipation. However, the accuracy comes at a cost. On one hand, this method requires more intensive computation. On the other hand, more boundary data is subject to be transferred during the process of simulation.s In this work, we present a parallel hybrid solution of the PPMLR-MHD model implemented using the computing capabilities of both CPUs and GPUs. We demonstrate that our optimized implementation alleviates the data transfer overhead by using GPU Direct technology and can scale up to 151 processes and achieve significant performance gains by distributing the workload among the CPUs and GPUs on Titan at Oak Ridge National Laboratory. The performance results show that our implementation is fast enough to carry out highly accurate MHD simulations in real time.

قيم البحث

اقرأ أيضاً

Magnetohydrodynamical (MHD) dynamos emerge in many different astrophysical situations where turbulence is present, but the interaction between large-scale (LSD) and small-scale dynamos (SSD) is not fully understood. We performed a systematic study of turbulent dynamos driven by isotropic forcing in isothermal MHD with magnetic Prandtl number of unity, focusing on the exponential growth stage. Both helical and non-helical forcing was employed to separate the effects of LSD and SSD in a periodic domain. Reynolds numbers (Rm) up to $approx 250$ were examined and multiple resolutions used for convergence checks. We ran our simulations with the Astaroth code, designed to accelerate 3D stencil computations on graphics processing units (GPUs) and to employ multiple GPUs with peer-to-peer communication. We observed a speedup of $approx 35$ in single-node performance compared to the widely used multi-CPU MHD solver Pencil Code. We estimated the growth rates both from the averaged magnetic fields and their power spectra. At low Rm, LSD growth dominates, but at high Rm SSD appears to dominate in both helically and non-helically forced cases. Pure SSD growth rates follow a logarithmic scaling as a function of Rm. Probability density functions of the magnetic field from the growth stage exhibit SSD behaviour in helically forced cases even at intermediate Rm. We estimated mean-field turbulence transport coefficients using closures like the second-order correlation approximation (SOCA). They yield growth rates similar to the directly measured ones and provide evidence of $alpha$ quenching. Our results are consistent with the SSD inhibiting the growth of the LSD at moderate Rm, while the dynamo growth is enhanced at higher Rm.
69 - Alexei A. Pevtsov 2016
In the United States, scientific research in space weather is funded by several Government Agencies including the National Science Foundation (NSF) and the National Aeronautics and Space Agency (NASA). For commercial purposes, space weather forecast is made by the Space Weather Prediction Center (SWPC) of the National Oceanic and Atmospheric Administration (NOAA). Observations come from the network of groundbased observatories funded via various sources, as well as from the instruments on spacecraft. Numerical models used in forecast are developed in the framework of individual research projects. Later, the most promising models are selected for additional testing at SWPC. In order to increase the application of models in research and education, NASA in collaboration with other agencies created Community Coordinated Modeling Center (CCMC). In mid-1990, US scientific community presented compelling evidence for developing the National Program on Space Weather, and in 1995, such program has been formally created. In 2015, the National Council on Science and Technology issued two documents: the National Space Weather Strategy [1] and the Action Plan [2]. In the near future, these two documents will define the development of Space Weather research and forecasting activity in USA. Both documents emphasize the need for close international collaboration in area of space weather.
With widespread advances in machine learning, a number of large enterprises are beginning to incorporate machine learning models across a number of products. These models are typically trained on shared, multi-tenant GPU clusters. Similar to existing cluster computing workloads, scheduling frameworks aim to provide features like high efficiency, resource isolation, fair sharing across users, etc. However Deep Neural Network (DNN) based workloads, predominantly trained on GPUs, differ in two significant ways from traditional big data analytics workloads. First, from a cluster utilization perspective, GPUs represent a monolithic resource that cannot be shared at a fine granularity across users. Second, from a workload perspective, deep learning frameworks require gang scheduling reducing the flexibility of scheduling and making the jobs themselves inelastic to failures at runtime. In this paper we present a detailed workload characterization of a two-month long trace from a multi-tenant GPU cluster in a large enterprise. By correlating scheduler logs with logs from individual jobs, we study three distinct issues that affect cluster utilization for DNN training workloads on multi-tenant clusters: (1) the effect of gang scheduling and locality constraints on queuing, (2) the effect of locality on GPU utilization, and (3) failures during training. Based on our experience running a large-scale operation, we provide design guidelines pertaining to next-generation cluster schedulers for DNN training workloads.
We present a Gauss-Newton-Krylov solver for large deformation diffeomorphic image registration. We extend the publicly available CLAIRE library to multi-node multi-graphics processing unit (GPUs) systems and introduce novel algorithmic modifications that significantly improve performance. Our contributions comprise ($i$) a new preconditioner for the reduced-space Gauss-Newton Hessian system, ($ii$) a highly-optimized multi-node multi-GPU implementation exploiting device direct communication for the main computational kernels (interpolation, high-order finite difference operators and Fast-Fourier-Transform), and ($iii$) a comparison with state-of-the-art CPU and GPU implementations. We solve a $256^3$-resolution image registration problem in five seconds on a single NVIDIA Tesla V100, with a performance speedup of 70% compared to the state-of-the-art. In our largest run, we register $2048^3$ resolution images (25 B unknowns; approximately 152$times$ larger than the largest problem solved in state-of-the-art GPU implementations) on 64 nodes with 256 GPUs on TACCs Longhorn system.
To address the challenge of performance analysis on the US DOEs forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help deve lopers understand the performance of accelerated applications as a whole, HPCToolkits measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkits new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا