Do you want to publish a course? Click here

MD Simulation of Hundred-Billion-Metal-Atom Cascade Collision on Sunway Taihulight

124   0   0.0 ( 0 )
 Added by Genshen Chu
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Radiation damage to the steel material of reactor pressure vessels is a major threat to the nuclear reactor safety. It is caused by the metal atom cascade collision, initialized when the atoms are struck by a high-energy neutron. The paper presents MISA-MD, a new implementation of molecular dynamics, to simulate such cascade collision with EAM potential. MISA-MD realizes (1) a hash-based data structure to efficiently store an atom and find its neighbors, and (2) several acceleration and optimization strategies based on SW26010 processor of Sunway Taihulight supercomputer, including an efficient potential table storage and interpolation method, a coloring method to avoid write conflicts, and double-buffer and data reuse strategies. The experimental results demonstrated that MISA-MD has good accuracy and scalability, and obtains a parallel efficiency of over 79% in an 655-billion-atom system. Compared with a state-of-the-art MD program LAMMPS, MISA-MD requires less memory usage and achieves better computational performance.



rate research

Read More

Boson sampling is expected to be one of an important milestones that will demonstrate quantum supremacy. The present work establishes the benchmarking of Gaussian boson sampling (GBS) with threshold detection based on the Sunway TaihuLight supercomputer. To achieve the best performance and provide a competitive scenario for future quantum computing studies, the selected simulation algorithm is fully optimized based on a set of innovative approaches, including a parallel scheme and instruction-level optimizing method. Furthermore, data precision and instruction scheduling are handled in a sophisticated manner by an adaptive precision optimization scheme and a DAG-based heuristic search algorithm, respectively. Based on these methods, a highly efficient and parallel quantum sampling algorithm is designed. The largest run enables us to obtain one Torontonian function of a 100 x 100 submatrix from 50-photon GBS within 20 hours in 128-bit precision and 2 days in 256-bit precision.
High performance computing (HPC) is a powerful tool to accelerate the Kohn-Sham density functional theory (KS-DFT) calculations on modern heterogeneous supercomputers. Here, we describe a massively extreme-scale parallel and portable implementation of discontinuous Galerkin density functional theory (DGDFT) method on the Sunway TaihuLight supercomputer. The DGDFT method uses the adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field (SCF) iteration to solve the KS equations with the high precision comparable to that of plane-wave basis set. In particular, the DGDFT method adopts a two-level parallelization strategy that makes use of different types of data distribution, task scheduling, and data communication schemes, and combines with the feature of master-slave multi-thread heterogeneous parallelism of SW26010 processor, resulting in extreme-scale HPC KS-DFT calculations on the Sunway TaihuLight supercomputer. We show that the DGDFT method can scale up to 8,519,680 processing cores (131,072 core groups) on the Sunway TaihuLight supercomputer for investigating the electronic structures of two-dimensional (2D) metallic graphene systems containing tens of thousands of carbon atoms.
Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework called FlashEigen, which extends Anasazi eigensolvers to SSDs, to compute eigenvalues of a graph with hundreds of millions or even billions of vertices in a single machine. FlashEigen performs sparse matrix multiplication in a semi-external memory fashion, i.e., we keep the sparse matrix on SSDs and the dense matrix in memory. We store the entire vector subspace on SSDs and reduce I/O to improve performance through caching the most recent dense matrix. Our result shows that FlashEigen is able to achieve 40%-60% performance of its in-memory implementation and has performance comparable to the Anasazi eigensolvers on a machine with 48 CPU cores. Furthermore, it is capable of scaling to a graph with 3.4 billion vertices and 129 billion edges. It takes about four hours to compute eight eigenvalues of the billion-node graph using 120 GB memory.
We suggest a theoretical description of the force-induced translocation dynamics of a polymer chain through a nanopore. Our consideration is based on the tensile (Pincus) blob picture of a pulled chain and the notion of propagating front of tensile force along the chain backbone, suggested recently by T. Sakaue. The driving force is associated with a chemical potential gradient that acts on each chain segment inside the pore. Depending on its strength, different regimes of polymer motion (named after the typical chain conformation, trumpet, stem-trumpet, etc.) occur. Assuming that the local driving and drag forces are equal (i.e., in a quasi-static approximation), we derive an equation of motion for the tensile front position $X(t)$. We show that the scaling law for the average translocation time $<tau>$ changes from $<tau> sim N^{2 u}/f^{1/ u}$ to $<tau> sim N^{1+ u}/f$ (for the free-draining case) as the dimensionless force ${widetilde f}_{R} = a N^{ u}f /T$ (where $a$, $N$, $ u$, $f$, $T$ are the Kuhn segment length, the chain length, the Flory exponent, the driving force, and the temperature, respectively) increases. These and other predictions are tested by Molecular Dynamics (MD) simulation. Data from our computer experiment indicates indeed that the translocation scaling exponent $alpha$ grows with the pulling force ${widetilde f}_{R}$) albeit the observed exponent $alpha$ stays systematically smaller than the theoretically predicted value. This might be associated with fluctuations which are neglected in the quasi-static approximation.
Quantum many-body systems (QMBs) are some of the most challenging physical systems to simulate numerically. Methods involving approximations for tensor network (TN) contractions have proven to be viable alternatives to algorithms such as quantum Monte Carlo or simulated annealing. However, these methods are cumbersome, difficult to implement, and often have significant limitations in their accuracy and efficiency when considering systems in more than one dimension. In this paper, we explore the exact computation of TN contractions on two-dimensional geometries and present a heuristic improvement of TN contraction that reduces the computing time, the amount of memory, and the communication time. We run our algorithm for the Ising model using memory optimized x1.32x large instances on Amazon Web Services (AWS) Elastic Compute Cloud (EC2). Our results show that cloud computing is a viable alternative to supercomputers for this class of scientific applications.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا