Do you want to publish a course? Click here

Johnson-Lindenstrauss Lemma, Linear and Nonlinear Random Projections, Random Fourier Features, and Random Kitchen Sinks: Tutorial and Survey

313   0   0.0 ( 0 )
 Added by Benyamin Ghojogh
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

This is a tutorial and survey paper on the Johnson-Lindenstrauss (JL) lemma and linear and nonlinear random projections. We start with linear random projection and then justify its correctness by JL lemma and its proof. Then, sparse random projections with $ell_1$ norm and interpolation norm are introduced. Two main applications of random projection, which are low-rank matrix approximation and approximate nearest neighbor search by random projection onto hypercube, are explained. Random Fourier Features (RFF) and Random Kitchen Sinks (RKS) are explained as methods for nonlinear random projection. Some other methods for nonlinear random projection, including extreme learning machine, randomly weighted neural networks, and ensemble of random projections, are also introduced.



rate research

Read More

95 - Xiaoyun Li , Ping Li 2021
The method of random projection (RP) is the standard technique in machine learning and many other areas, for dimensionality reduction, approximate near neighbor search, compressed sensing, etc. Basically, RP provides a simple and effective scheme for approximating pairwise inner products and Euclidean distances in massive data. Closely related to RP, the method of random Fourier features (RFF) has also become popular, for approximating the Gaussian kernel. RFF applies a specific nonlinear transformation on the projected data from random projections. In practice, using the (nonlinear) Gaussian kernel often leads to better performance than the linear kernel (inner product), partly due to the tuning parameter $(gamma)$ introduced in the Gaussian kernel. Recently, there has been a surge of interest in studying properties of RFF. After random projections, quantization is an important step for efficient data storage, computation, and transmission. Quantization for RP has also been extensive studied in the literature. In this paper, we focus on developing quantization algorithms for RFF. The task is in a sense challenging due to the tuning parameter $gamma$ in the Gaussian kernel. For example, the quantizer and the quantized data might be tied to each specific tuning parameter $gamma$. Our contribution begins with an interesting discovery, that the marginal distribution of RFF is actually free of the Gaussian kernel parameter $gamma$. This small finding significantly simplifies the design of the Lloyd-Max (LM) quantization scheme for RFF in that there would be only one LM quantizer for RFF (regardless of $gamma$). We also develop a variant named LM$^2$-RFF quantizer, which in certain cases is more accurate. Experiments confirm that the proposed quantization schemes perform well.
We introduce a novel random projection technique for efficiently reducing the dimension of very high-dimensional tensors. Building upon classical results on Gaussian random projections and Johnson-Lindenstrauss transforms~(JLT), we propose two tensorized random projection maps relying on the tensor train~(TT) and CP decomposition format, respectively. The two maps offer very low memory requirements and can be applied efficiently when the inputs are low rank tensors given in the CP or TT format. Our theoretical analysis shows that the dense Gaussian matrix in JLT can be replaced by a low-rank tensor implicitly represented in compressed form with random factors, while still approximately preserving the Euclidean distance of the projected inputs. In addition, our results reveal that the TT format is substantially superior to CP in terms of the size of the random projection needed to achieve the same distortion ratio. Experiments on synthetic data validate our theoretical analysis and demonstrate the superiority of the TT decomposition.
Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. In this paper we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation point of view, give tight bounds on the number of Fourier features required to achieve a spectral approximation, and show how spectral matrix approximation bounds imply statistical guarantees for kernel ridge regression. Qualitatively, our results are twofold: on the one hand, we show that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions. At the same time, we show that the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance. We study this optimal sampling distribution for the Gaussian kernel, achieving a nearly complete characterization for the case of low-dimensional bounded datasets. Based on this characterization, we propose an efficient sampling scheme with guarantees superior to random Fourier features in this regime.
This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$, their dimension $p$, and the dimension of feature space $N$ are all large and comparable. In this regime, the random RFF Gram matrix no longer converges to the well-known limiting Gaussian kernel matrix (as it does when $N to infty$ alone), but it still has a tractable behavior that is captured by our analysis. This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$. Based on these estimates, a precise characterization of two qualitatively different phases of learning, including the phase transition between them, is provided; and the corresponding double descent test error curve is derived from this phase transition behavior. These results do not depend on strong assumptions on the data distribution, and they perfectly match empirical results on real-world data sets.
The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased estimate of the gradient unless the samples are drawn from the exact softmax distribution, which is again expensive to compute. Therefore, a widely employed practical approach involves sampling from a simpler distribution in the hope of approximating the exact softmax distribution. In this paper, we develop the first theoretical understanding of the role that different sampling distributions play in determining the quality of sampled softmax. Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier Features to enable more efficient and accurate sampling from an approximate softmax distribution. We show that RF-softmax leads to low bias in estimation in terms of both the full softmax distribution and the full softmax gradient. Furthermore, the cost of RF-softmax scales only logarithmically with the number of classes.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا