ﻻ يوجد ملخص باللغة العربية
CP tensor decomposition with alternating least squares (ALS) is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel that is necessary to set up the quadratic optimization subproblems. State-of-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every (N-1)/N sweeps. This algorithm reduces the leading order computational cost by a factor of 2(N-1)/N relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results show that the per-sweep time achieves 1.25X speed-up for MSDT and 1.94X speed-up for pairwise perturbation compared to the state-of-art dimension trees running on 1024 processors on the Stampede2 supercomputer.
The alternating least squares algorithm for CP and Tucker decomposition is dominated in cost by the tensor contractions necessary to set up the quadratic optimization subproblems. We introduce a novel family of algorithms that uses perturbative corre
Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank ap
We introduce an adaptive element-based domain decomposition (DD) method for solving saddle point problems defined as a block two by two matrix. The algorithm does not require any knowledge of the constrained space. We assume that all sub matrices are
In this work we formally derive and prove the correctness of the algorithms and data structures in a parallel, distributed-memory, generic finite element framework that supports h-adaptivity on computational domains represented as forest-of-trees. Th
Quaternion singular value decomposition (QSVD) is a robust technique of digital watermarking which can extract high quality watermarks from watermarked images with low distortion. In this paper, QSVD technique is further investigated and an efficient