No Arabic abstract
We give a fast oblivious L2-embedding of $Ain mathbb{R}^{n x d}$ to $Bin mathbb{R}^{r x d}$ satisfying $(1-varepsilon)|A x|_2^2 le |B x|_2^2 <= (1+varepsilon) |Ax|_2^2.$ Our embedding dimension $r$ equals $d$, a constant independent of the distortion $varepsilon$. We use as a black-box any L2-embedding $Pi^T A$ and inherit its runtime and accuracy, effectively decoupling the dimension $r$ from runtime and accuracy, allowing downstream machine learning applications to benefit from both a low dimension and high accuracy (in prior embeddings higher accuracy means higher dimension). We give applications of our L2-embedding to regression, PCA and statistical leverage scores. We also give applications to L1: 1.) An oblivious L1-embedding with dimension $d+O(dln^{1+eta} d)$ and distortion $O((dln d)/lnln d)$, with application to constructing well-conditioned bases; 2.) Fast approximation of L1-Lewis weights using our L2 embedding to quickly approximate L2-leverage scores.
Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings. To help manage this outsized memory consumption, we explore mixed dimension embeddings, an embedding layer architecture in which a particular embedding vectors dimension scales with its query frequency. Through theoretical analysis and systematic experiments, we demonstrate that using mixed dimensions can drastically reduce the memory usage, while maintaining and even improving the ML performance. Empirically, we show that the proposed mixed dimension layers improve accuracy by 0.1% using half as many parameters or maintain it using 16X fewer parameters for click-through rate prediction task on the Criteo Kaggle dataset.
This letter analyzes the performances of a simple reconstruction method, namely the Projected Back-Projection (PBP), for estimating the direction of a sparse signal from its phase-only (or amplitude-less) complex Gaussian random measurements, i.e., an extension of one-bit compressive sensing to the complex field. To study the performances of this algorithm, we show that complex Gaussian random matrices respect, with high probability, a variant of the Restricted Isometry Property (RIP) relating to the l1 -norm of the sparse signal measurements to their l2 -norm. This property allows us to upper-bound the reconstruction error of PBP in the presence of phase noise. Monte Carlo simulations are performed to highlight the performance of our approach in this phase-only acquisition model when compared to error achieved by PBP in classical compressive sensing.
Compressed Sensing aims to capture attributes of a sparse signal using very few measurements. Cand`{e}s and Tao showed that sparse reconstruction is possible if the sensing matrix acts as a near isometry on all $boldsymbol{k}$-sparse signals. This property holds with overwhelming probability if the entries of the matrix are generated by an iid Gaussian or Bernoulli process. There has been significant recent interest in an alternative signal processing framework; exploiting deterministic sensing matrices that with overwhelming probability act as a near isometry on $boldsymbol{k}$-sparse vectors with uniformly random support, a geometric condition that is called the Statistical Restricted Isometry Property or StRIP. This paper considers a family of deterministic sensing matrices satisfying the StRIP that are based on srm codes (binary chirps) and a $boldsymbol{k}$-sparse reconstruction algorithm with sublinear complexity. In the presence of stochastic noise in the data domain, this paper derives bounds on the $boldsymbol{ell_2}$ accuracy of approximation in terms of the $boldsymbol{ell_2}$ norm of the measurement noise and the accuracy of the best $boldsymbol{k}$-sparse approximation, also measured in the $boldsymbol{ell_2}$ norm. This type of $boldsymbol{ell_2 /ell_2}$ bound is tighter than the standard $boldsymbol{ell_2 /ell_1}$ or $boldsymbol{ell_1/ ell_1}$ bounds.
Clustering is essential to many tasks in pattern recognition and computer vision. With the advent of deep learning, there is an increasing interest in learning deep unsupervised representations for clustering analysis. Many works on this domain rely on variants of auto-encoders and use the encoder outputs as representations/features for clustering. In this paper, we show that an l2 normalization constraint on these representations during auto-encoder training, makes the representations more separable and compact in the Euclidean space after training. This greatly improves the clustering accuracy when k-means clustering is employed on the representations. We also propose a clustering based unsupervised anomaly detection method using l2 normalized deep auto-encoder representations. We show the effect of l2 normalization on anomaly detection accuracy. We further show that the proposed anomaly detection method greatly improves accuracy compared to previously proposed deep methods such as reconstruction error based anomaly detection.
This paper studies semantic parsing for interlanguage (L2), taking semantic role labeling (SRL) as a case task and learner Chinese as a case language. We first manually annotate the semantic roles for a set of learner texts to derive a gold standard for automatic SRL. Based on the new data, we then evaluate three off-the-shelf SRL systems, i.e., the PCFGLA-parser-based, neural-parser-based and neural-syntax-agnostic systems, to gauge how successful SRL for learner Chinese can be. We find two non-obvious facts: 1) the L1-sentence-trained systems performs rather badly on the L2 data; 2) the performance drop from the L1 data to the L2 data of the two parser-based systems is much smaller, indicating the importance of syntactic parsing in SRL for interlanguages. Finally, the paper introduces a new agreement-based model to explore the semantic coherency information in the large-scale L2-L1 parallel data. We then show such information is very effective to enhance SRL for learner texts. Our model achieves an F-score of 72.06, which is a 2.02 point improvement over the best baseline.