No Arabic abstract
We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that mix input tokens. These linear mixers, along with standard nonlinearities in feed-forward layers, prove competent at modeling semantic relationships in several text classification tasks. Most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder with a standard, unparameterized Fourier Transform achieves 92-97% of the accuracy of BERT counterparts on the GLUE benchmark, but trains 80% faster on GPUs and 70% faster on TPUs at standard 512 input lengths. At longer input lengths, our FNet model is significantly faster: when compared to the efficient Transformers on the Long Range Arena benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all sequence lengths on GPUs (and across relatively shorter lengths on TPUs). Finally, FNet has a light memory footprint and is particularly efficient at smaller model sizes; for a fixed speed and accuracy budget, small FNet models outperform Transformer counterparts.
Reconstructing continuous signals from a small number of discrete samples is a fundamental problem across science and engineering. In practice, we are often interested in signals with simple Fourier structure, such as bandlimited, multiband, and Fourier sparse signals. More broadly, any prior knowledge about a signals Fourier power spectrum can constrain its complexity. Intuitively, signals with more highly constrained Fourier structure require fewer samples to reconstruct. We formalize this intuition by showing that, roughly, a continuous signal from a given class can be approximately reconstructed using a number of samples proportional to the *statistical dimension* of the allowed power spectrum of that class. Further, in nearly all settings, this natural measure tightly characterizes the sample complexity of signal reconstruction. Surprisingly, we also show that, up to logarithmic factors, a universal non-uniform sampling strategy can achieve this optimal complexity for *any class of signals*. We present a simple and efficient algorithm for recovering a signal from the samples taken. For bandlimited and sparse signals, our method matches the state-of-the-art. At the same time, it gives the first computationally and sample efficient solution to a broad range of problems, including multiband signal reconstruction and kriging and Gaussian process regression tasks in one dimension. Our work is based on a novel connection between randomized linear algebra and signal reconstruction with constrained Fourier structure. We extend tools based on statistical leverage score sampling and column-based matrix reconstruction to the approximation of continuous linear operators that arise in signal reconstruction. We believe that these extensions are of independent interest and serve as a foundation for tackling a broad range of continuous time problems using randomized methods.
The Fractional Fourier Transform (FrFT) has widespread applications in areas like signal analysis, Fourier optics, diffraction theory, etc. The Holomorphic Fractional Fourier Transform (HFrFT) proposed in the present paper may be used in the same wide range of applications with improved properties. The HFrFT of signals spans a one-parameter family of (essentially) holomorphic functions, where the parameter takes values in the bounded interval $tin (0,pi/2)$. At the boundary values of the parameter, one obtains the original signal at $t=0$ and its Fourier transform at the other end of the interval $t=pi/2$. If the initial signal is $L^2 $, then, for an appropriate choice of inner product that will be detailed below, the transform is unitary for all values of the parameter in the interval. This transform provides a heat kernel smoothening of the signals while preserving unitarity for $L^2$-signals and continuously interpolating between the original signal and its Fourier transform.
In this work we verify the sufficiency of a Jensens necessary and sufficient condition for a class of genus 0 or 1 entire functions to have only real zeros. They are Fourier transforms of even, positive, indefinitely differentiable, and very fast decreasing functions. We also apply our result to several important special functions in mathematics, such as modified Bessel function $K_{iz}(a), a>0$ as a function of variable $z$, Riemann Xi function $Xi(z)$, and character Xi function $Xi(z;chi)$ when $chi$ is a real primitive non-principal character satisfying $varphi(u;chi)ge0$ on the real line, we prove these entire functions have only real zeros.
Spatial Semantic Pointers (SSPs) have recently emerged as a powerful tool for representing and transforming continuous space, with numerous applications to cognitive modelling and deep learning. Fundamental to SSPs is the notion of similarity between vectors representing different points in $n$-dimensional space -- typically the dot product or cosine similarity between vectors with rotated unit-length complex coefficients in the Fourier domain. The similarity measure has previously been conjectured to be a Gaussian function of Euclidean distance. Contrary to this conjecture, we derive a simple trigonometric formula relating spatial displacement to similarity, and prove that, in the case where the Fourier coefficients are uniform i.i.d., the expected similarity is a product of normalized sinc functions: $prod_{k=1}^{n} operatorname{sinc} left( a_k right)$, where $mathbf{a} in mathbb{R}^n$ is the spatial displacement between the two $n$-dimensional points. This establishes a direct link between space and the similarity of SSPs, which in turn helps bolster a useful mathematical framework for architecting neural networks that manipulate spatial structures.
We show that the adjunction counits of a Fourier-Mukai transform $Phi$ from $D(X_1)$ to $D(X_2)$ arise from maps of the kernels of the corresponding Fourier-Mukai transforms. In a very general setting of proper separable schemes of finite type over a field we write down these maps of kernels explicitly -- facilitating the computation of the twist (the cone of an adjunction counit) of $Phi$. We also give another description of these maps, better suited to computing cones if the kernel of $Phi$ is a pushforward from a closed subscheme $Z$ of $X_1 times X_2$. Moreover, we show that we can replace the condition of properness of the ambient spaces $X_1$ and $X_2$ by that of $Z$ being proper over them and still have this description apply as is. This can be used, for instance, to compute spherical twists on non-proper varieties directly and in full generality.