ترغب بنشر مسار تعليمي؟ اضغط هنا

Sparse Nonnegative Convolution Is Equivalent to Dense Nonnegative Convolution

76   0   0.0 ( 0 )
 نشر من قبل Nick Fischer
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Computing the convolution $Astar B$ of two length-$n$ vectors $A,B$ is an ubiquitous computational primitive. Applications range from string problems to Knapsack-type problems, and from 3SUM to All-Pairs Shortest Paths. These applications often come in the form of nonnegative convolution, where the entries of $A,B$ are nonnegative integers. The classical algorithm to compute $Astar B$ uses the Fast Fourier Transform and runs in time $O(nlog n)$. However, often $A$ and $B$ satisfy sparsity conditions, and hence one could hope for significant improvements. The ideal goal is an $O(klog k)$-time algorithm, where $k$ is the number of non-zero elements in the output, i.e., the size of the support of $Astar B$. This problem is referred to as sparse nonnegative convolution, and has received considerable attention in the literature; the fastest algorithms to date run in time $O(klog^2 n)$. The main result of this paper is the first $O(klog k)$-time algorithm for sparse nonnegative convolution. Our algorithm is randomized and assumes that the length $n$ and the largest entry of $A$ and $B$ are subexponential in $k$. Surprisingly, we can phrase our algorithm as a reduction from the sparse case to the dense case of nonnegative convolution, showing that, under some mild assumptions, sparse nonnegative convolution is equivalent to dense nonnegative convolution for constant-error randomized algorithms. Specifically, if $D(n)$ is the time to convolve two nonnegative length-$n$ vectors with success probability $2/3$, and $S(k)$ is the time to convolve two nonnegative vectors with output size $k$ with success probability $2/3$, then $S(k)=O(D(k)+k(loglog k)^2)$. Our approach uses a variety of new techniques in combination with some old machinery from linear sketching and structured linear algebra, as well as new insights on linear hashing, the most classical hash function.

قيم البحث

اقرأ أيضاً

Computing the convolution $Astar B$ of two length-$n$ integer vectors $A,B$ is a core problem in several disciplines. It frequently comes up in algorithms for Knapsack, $k$-SUM, All-Pairs Shortest Paths, and string pattern matching problems. For thes e applications it typically suffices to compute convolutions of nonnegative vectors. This problem can be classically solved in time $O(nlog n)$ using the Fast Fourier Transform. However, often the involved vectors are sparse and hence one could hope for output-sensitive algorithms to compute nonnegative convolutions. This question was raised by Muthukrishnan and solved by Cole and Hariharan (STOC 02) by a randomized algorithm running in near-linear time in the (unknown) output-size $t$. Chan and Lewenstein (STOC 15) presented a deterministic algorithm with a $2^{O(sqrt{log tcdotloglog n})}$ overhead in running time and the additional assumption that a small superset of the output is given; this assumption was later removed by Bringmann and Nakos (ICALP 21). In this paper we present the first deterministic near-linear-time algorithm for computing sparse nonnegative convolutions. This immediately gives improved deterministic algorithms for the state-of-the-art of output-sensitive Subset Sum, block-mass pattern matching, $N$-fold Boolean convolution, and others, matching up to log-factors the fastest known randomized algorithms for these problems. Our algorithm is a blend of algebraic and combinatorial ideas and techniques. Additionally, we provide two fast Las Vegas algorithms for computing sparse nonnegative convolutions. In particular, we present a simple $O(tlog^2t)$ time algorithm, which is an accessible alternative to Cole and Hariharans algorithm. We further refine this new algorithm to run in Las Vegas time $O(tlog tcdotloglog t)$, matching the running time of the dense case apart from the $loglog t$ factor.
In the Nonnegative Matrix Factorization (NMF) problem we are given an $n times m$ nonnegative matrix $M$ and an integer $r > 0$. Our goal is to express $M$ as $A W$ where $A$ and $W$ are nonnegative matrices of size $n times r$ and $r times m$ respec tively. In some applications, it makes sense to ask instead for the product $AW$ to approximate $M$ -- i.e. (approximately) minimize $ orm{M - AW}_F$ where $ orm{}_F$ denotes the Frobenius norm; we refer to this as Approximate NMF. This problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where $A$ and $W$ are computed using a variety of local search heuristics. Vavasis proved that this problem is NP-complete. We initiate a study of when this problem is solvable in polynomial time: 1. We give a polynomial-time algorithm for exact and approximate NMF for every constant $r$. Indeed NMF is most interesting in applications precisely when $r$ is small. 2. We complement this with a hardness result, that if exact NMF can be solved in time $(nm)^{o(r)}$, 3-SAT has a sub-exponential time algorithm. This rules out substantial improvements to the above algorithm. 3. We give an algorithm that runs in time polynomial in $n$, $m$ and $r$ under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first example of a polynomial-time algorithm that provably works under a non-trivial condition on the input and we believe that this will be an interesting and important direction for future work.
130 - Jukka Suomela 2014
This work shows that the following problems are equivalent, both in theory and in practice: - median filtering: given an $n$-element vector, compute the sliding window median with window size $k$, - piecewise sorting: given an $n$-element vector, divide it in $n/k$ blocks of length $k$ and sort each block. By prior work, median filtering is known to be at least as hard as piecewise sorting: with a single median filter operation we can sort $Theta(n/k)$ blocks of length $Theta(k)$. The present work shows that median filtering is also as easy as piecewise sorting: we can do median filtering with one piecewise sorting operation and linear-time postprocessing. In particular, median filtering can directly benefit from the vast literature on sorting algorithms---for example, adaptive sorting algorithms imply adaptive median filtering algorithms. The reduction is very efficient in practice---for random inputs the performance of the new sorting-based algorithm is on a par with the fastest heap-based algorithms, and for benign data distributions it typically outperforms prior algorithms. The key technical idea is that we can represent the sliding window with a pair of sorted doubly-linked lists: we delete items from one list and add items to the other list. Deletions are easy; additions can be done efficiently if we reverse the time twice: First we construct the full list and delete the items in the reverse order. Then we undo each deletion with Knuths dancing links technique.
Existing tensor factorization methods assume that the input tensor follows some specific distribution (i.e. Poisson, Bernoulli, and Gaussian), and solve the factorization by minimizing some empirical loss functions defined based on the corresponding distribution. However, it suffers from several drawbacks: 1) In reality, the underlying distributions are complicated and unknown, making it infeasible to be approximated by a simple distribution. 2) The correlation across dimensions of the input tensor is not well utilized, leading to sub-optimal performance. Although heuristics were proposed to incorporate such correlation as side information under Gaussian distribution, they can not easily be generalized to other distributions. Thus, a more principled way of utilizing the correlation in tensor factorization models is still an open challenge. Without assuming any explicit distribution, we formulate the tensor factorization as an optimal transport problem with Wasserstein distance, which can handle non-negative inputs. We introduce SWIFT, which minimizes the Wasserstein distance that measures the distance between the input tensor and that of the reconstruction. In particular, we define the N-th order tensor Wasserstein loss for the widely used tensor CP factorization and derive the optimization algorithm that minimizes it. By leveraging sparsity structure and different equivalent formulations for optimizing computational efficiency, SWIFT is as scalable as other well-known CP algorithms. Using the factor matrices as features, SWIFT achieves up to 9.65% and 11.31% relative improvement over baselines for downstream prediction tasks. Under the noisy conditions, SWIFT achieves up to 15% and 17% relative improvements over the best competitors for the prediction tasks.
Deep convolutional neural networks (CNNs) are deployed in various applications but demand immense computational requirements. Pruning techniques and Winograd convolution are two typical methods to reduce the CNN computation. However, they cannot be d irectly combined because Winograd transformation fills in the sparsity resulting from pruning. Li et al. (2017) propose sparse Winograd convolution in which weights are directly pruned in the Winograd domain, but this technique is not very practical because Winograd-domain retraining requires low learning rates and hence significantly longer training time. Besides, Liu et al. (2018) move the ReLU function into the Winograd domain, which can help increase the weight sparsity but requires changes in the network structure. To achieve a high Winograd-domain weight sparsity without changing network structures, we propose a new pruning method, spatial-Winograd pruning. As the first step, spatial-domain weights are pruned in a structured way, which efficiently transfers the spatial-domain sparsity into the Winograd domain and avoids Winograd-domain retraining. For the next step, we also perform pruning and retraining directly in the Winograd domain but propose to use an importance factor matrix to adjust weight importance and weight gradients. This adjustment makes it possible to effectively retrain the pruned Winograd-domain network without changing the network structure. For the three models on the datasets of CIFAR10, CIFAR-100, and ImageNet, our proposed method can achieve the Winograd domain sparsities of 63%, 50%, and 74%, respectively.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا