No Arabic abstract
Tucker decomposition is proposed to reduce the memory requirement of the far-fields in the fast multipole method (FMM)-accelerated surface integral equation simulators. It is particularly used to compress the far-fields of FMM groups, which are stored in three-dimensional (3-D) arrays (or tensors). The compressed tensors are then used to perform fast tensor-vector multiplications during the aggregation and disaggregation stages of the FMM. For many practical scenarios, the proposed Tucker decomposition yields a significant reduction in the far-fields memory requirement while dramatically accelerating the aggregation and disaggregation stages. For the electromagnetic scattering analysis of a 30{lambda}-diameter sphere, it reduces the memory requirement of the far-fields more than 87% while it expedites the aggregation and disaggregation stages by a factor of 15.8 and 15.2, respectively, where {lambda} is the wavelength in free space.
We developed a fast numerical algorithm for solving the three dimensional vectorial Helmholtz equation that arises in electromagnetic scattering problems. The algorithm is based on electric field integral equations and is essentially a boundary element method. Nystroms quadrature rule with a triangular grid is employed to linearize the integral equations, which are then solved by using a right-preconditioned iterative method. We apply the fast multipole technique to accelerate the matrix-vector multiplications in the iterations. We demonstrate the broad applications and accuracy of this method with practical examples including dielectric, plasmonic and metallic objects. We then apply the method to investigate the plasmonic properties of a silver torus and a silver split-ring resonator under the incidence of an electromagnetic plane wave. We show the silver torus can be used as a trapping tool to bind small dielectric or metallic particles.
This paper is concerned with the Tucker decomposition based low rank tensor completion problem, which is about reconstructing a tensor $mathcal{T}inmathbb{R}^{ntimes ntimes n}$ of a small multilinear rank from partially observed entries. We study the convergence of the Riemannian gradient method for this problem. Guaranteed linear convergence in terms of the infinity norm has been established for this algorithm provided the number of observed entries is essentially in the order of $O(n^{3/2})$. The convergence analysis relies on the leave-one-out technique and the subspace projection structure within the algorithm. To the best of our knowledge, this is the first work that has established the entrywise convergence of a non-convex algorithm for low rank tensor completion via Tucker decomposition.
Our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its truncated Tucker decomposition, a higher-order analogue to the truncated singular value decomposition of a matrix. The result is a low-rank approximation of the original tensor-structured data. Compression efficiency is achieved by detecting latent global structure within the data, which we contrast to most compression methods that are focused on local structure. In this work, we describe TuckerMPI, our implementation of the truncated Tucker decomposition, including details of the data distribution and in-memory layouts, the parallel and serial implementations of the key kernels, and analysis of the storage, communication, and computational costs. We test the software on 4.5 terabyte and 6.7 terabyte data sets distributed across 100s of nodes (1000s of MPI processes), achieving compression rates between 100-200,000$times$ which equates to 99-99.999% compression (depending on the desired accuracy) in substantially less time than it would take to even read the same dataset from a parallel filesystem. Moreover, we show that our method also allows for reconstruction of partial or down-sampled data on a single node, without a parallel computer so long as the reconstructed portion is small enough to fit on a single machine, e.g., in the instance of reconstructing/visualizing a single down-sampled time step or computing summary statistics.
Smoothed particle hydrodynamics (SPH) is positioned as having ideal conservation properties. When properly implemented, conservation of total mass, energy, and both linear and angular momentum is guaranteed exactly, up to machine precision. This is particularly important for some applications in computational astrophysics, such as binary dynamics, mergers, and accretion of compact objects (neutron stars, black holes, and white dwarfs). However, in astrophysical applications that require the inclusion of gravity, calculating pairwise particle interactions becomes prohibitively expensive. In the Fast Multipole Method (FMM), they are, therefore, replaced with symmetric interactions between distant clusters of particles (contained in the tree nodes) Although such an algorithm is linear momentum-conserving, it introduces spurious torques that violate conservation of angular momentum. We present a modification of FMM that is free of spurious torques and conserves angular momentum explicitly. The new method has practically no computational overhead compared to the standard FMM.
It is proposed a new method of compressing laser pulse by fast extending plasma gratings(FEPG), which is created by ionizing the hypersound wave generated by stimulated Brillouin scattering(SBS) in the background gas. Ionized by a short laser pulse, the phonon forms a light-velocity FEPG to fully reflect a resonant pump laser. As the reflecting surface moves with a light velocity, the reflected pulse is temporally overlapped and compressed. This regime is supported by the simulation results of a fully kinetic particle-in-cell(PIC) code Opic with a laser wavelength of 1um, displaying a pump pulse is compressed from 13ps to a few cycles(7.2fs), with an efficiency close to 80%. It is a promising method to produce critical laser powers due to several features: high efficiency without a linear stage, robustness to plasma instabilities, no seed and a wide range of pump intensity.