No Arabic abstract
This paper considers the problem of recovering an unknown sparse ptimes p matrix X from an mtimes m matrix Y=AXB^T, where A and B are known m times p matrices with m << p. The main result shows that there exist constructions of the sketching matrices A and B so that even if X has O(p) non-zeros, it can be recovered exactly and efficiently using a convex program as long as these non-zeros are not concentrated in any single row/column of X. Furthermore, it suffices for the size of Y (the sketch dimension) to scale as m = O(sqrt{# nonzeros in X} times log p). The results also show that the recovery is robust and stable in the sense that if X is equal to a sparse matrix plus a perturbation, then the convex program we propose produces an approximation with accuracy proportional to the size of the perturbation. Unlike traditional results on sparse recovery, where the sensing matrix produces independent measurements, our sensing operator is highly constrained (it assumes a tensor product structure). Therefore, proving recovery guarantees require non-standard techniques. Indeed our approach relies on a novel result concerning tensor products of bipartite graphs, which may be of independent interest. This problem is motivated by the following application, among others. Consider a ptimes n data matrix D, consisting of n observations of p variables. Assume that the correlation matrix X:=DD^{T} is (approximately) sparse in the sense that each of the p variables is significantly correlated with only a few others. Our results show that these significant correlations can be detected even if we have access to only a sketch of the data S=AD with A in R^{mtimes p}.
Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. A standard approach to speed up this task is to compute sketches of the DNA reads (typically via hashing-based techniques) that allow the efficient computation of pairwise alignment scores. We propose a rate-distortion framework to study the problem of computing sketches that achieve the optimal tradeoff between sketch size and alignment estimation distortion. We consider the simple setting of i.i.d. error-free sources of length $n$ and introduce a new sketching algorithm called locational hashing. While standard approaches in the literature based on min-hashes require $B = (1/D) cdot Oleft( log n right)$ bits to achieve a distortion $D$, our proposed approach only requires $B = log^2(1/D) cdot O(1)$ bits. This can lead to significant computational savings in pairwise alignment estimation.
Golay complementary sequences have been put a high value on the applications in orthogonal frequency-division multiplexing (OFDM) systems since its good peak-to-mean envelope power ratio(PMEPR) properties. However, with the increase of the code length, the code rate of the standard Golay sequences suffer a dramatic decline. Even though a lot of efforts have been paid to solve the code rate problem for OFDM application, how to construct large classes of sequences with low PMEPR is still difficult and open now. In this paper, we propose a new method to construct $q$-ary Golay complementary set of size $N$ and length $N^n$ by $Ntimes N$ Hadamard Matrices where $n$ is arbitrary and $N$ is a power of 2. Every item of the constructed sequences can be presented as the product of the specific entries of the Hadamard Matrices. The previous works in cite{BudIT} can be regarded as a special case of the constructions in this paper and we also obtained new quaternary Golay sets never reported in the literature.
Advances of information-theoretic understanding of sparse sampling of continuous uncoded signals at sampling rates exceeding the Landau rate were reported in recent works. This work examines sparse sampling of coded signals at sub-Landau sampling rates. It is shown that with coded signals the Landau condition may be relaxed and the sampling rate required for signal reconstruction and for support detection can be lower than the effective bandwidth. Equivalently, the number of measurements in the corresponding sparse sensing problem can be smaller than the support size. Tight bounds on information rates and on signal and support detection performance are derived for the Gaussian sparsely sampled channel and for the frequency-sparse channel using the context of state dependent channels. Support detection results are verified by a simulation. When the system is high-dimensional the required SNR is shown to be finite but high and rising with decreasing sampling rate, in some practical applications it can be lowered by reducing the a-priory uncertainty about the support e.g. by concentrating the frequency support into a finite number of subbands.
Sparse Principal Component Analysis (PCA) is a dimensionality reduction technique wherein one seeks a low-rank representation of a data matrix with additional sparsity constraints on the obtained representation. We consider two probabilistic formulations of sparse PCA: a spiked Wigner and spiked Wishart (or spiked covariance) model. We analyze an Approximate Message Passing (AMP) algorithm to estimate the underlying signal and show, in the high dimensional limit, that the AMP estimates are information-theoretically optimal. As an immediate corollary, our results demonstrate that the posterior expectation of the underlying signal, which is often intractable to compute, can be obtained using a polynomial-time scheme. Our results also effectively provide a single-letter characterization of the sparse PCA problem.
Let $f:{-1,1}^n$ be a polynomial with at most $s$ non-zero real coefficients. We give an algorithm for exactly reconstructing f given random examples from the uniform distribution on ${-1,1}^n$ that runs in time polynomial in $n$ and $2s$ and succeeds if the function satisfies the unique sign property: there is one output value which corresponds to a unique set of values of the participating parities. This sufficient condition is satisfied when every coefficient of f is perturbed by a small random noise, or satisfied with high probability when s parity functions are chosen randomly or when all the coefficients are positive. Learning sparse polynomials over the Boolean domain in time polynomial in $n$ and $2s$ is considered notoriously hard in the worst-case. Our result shows that the problem is tractable for almost all sparse polynomials. Then, we show an application of this result to hypergraph sketching which is the problem of learning a sparse (both in the number of hyperedges and the size of the hyperedges) hypergraph from uniformly drawn random cuts. We also provide experimental results on a real world dataset.