Learning with Neural Tangent Kernels in Near Input Sparsity Time


Abstract in English

The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under least squares loss by gradient descent. However, despite its importance, the super-quadratic runtime of kernel methods limits the use of NTK in large-scale learning tasks. To accelerate kernel machines with NTK, we propose a near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation. Our transformation works by sketching the polynomial expansions of arc-cosine kernels. Furthermore, we propose a feature map for approximating the convolutional counterpart of the NTK, which can transform any image using a runtime that is only linear in the number of pixels. We show that in standard large-scale regression and classification tasks a linear regressor trained on our features outperforms trained Neural Nets and Nystrom approximation of NTK kernel.

Download