No Arabic abstract
Linear dimensionality reduction methods are commonly used to extract low-dimensional structure from high-dimensional data. However, popular methods disregard temporal structure, rendering them prone to extracting noise rather than meaningful dynamics when applied to time series data. At the same time, many successful unsupervised learning methods for temporal, sequential and spatial data extract features which are predictive of their surrounding context. Combining these approaches, we introduce Dynamical Components Analysis (DCA), a linear dimensionality reduction method which discovers a subspace of high-dimensional time series data with maximal predictive information, defined as the mutual information between the past and future. We test DCA on synthetic examples and demonstrate its superior ability to extract dynamical structure compared to commonly used linear methods. We also apply DCA to several real-world datasets, showing that the dimensions extracted by DCA are more useful than those extracted by other methods for predicting future states and decoding auxiliary variables. Overall, DCA robustly extracts dynamical structure in noisy, high-dimensional data while retaining the computational efficiency and geometric interpretability of linear dimensionality reduction methods.
We devise a novel neural network-based universal denoiser for the finite-input, general-output (FIGO) channel. Based on the assumption of known noisy channel densities, which is realistic in many practical scenarios, we train the network such that it can denoise as well as the best sliding window denoiser for any given underlying clean source data. Our algorithm, dubbed as Generalized CUDE (Gen-CUDE), enjoys several desirable properties; it can be trained in an unsupervised manner (solely based on the noisy observation data), has much smaller computational complexity compared to the previously developed universal denoiser for the same setting, and has much tighter upper bound on the denoising performance, which is obtained by a theoretical analysis. In our experiments, we show such tighter upper bound is also realized in practice by showing that Gen-CUDE achieves much better denoising results compared to other strong baselines for both synthetic and real underlying clean sequences.
We present a detailed analysis of the unconstrained $ell_1$-method Lasso method for sparse recovery of noisy data. The data is recovered by sensing its compressed output produced by randomly generated class of observing matrices satisfying a Restricted Isometry Property. We derive a new $ell_1$-error estimate which highlights the dependence on a certain compressiblity threshold: once the computed re-scaled residual crosses that threshold, the error is driven only by the (assumed small) noise and compressiblity. Here we identify the re-scaled residual as a key quantity which drives the error and we derive its sharp lower bound of order square-root of the size of the support of the computed solution.
We consider the problem of computing a binary linear transformation using unreliable components when all circuit components are unreliable. Two noise models of unreliable components are considered: probabilistic errors and permanent errors. We introduce the ENCODED technique that ensures that the error probability of the computation of the linear transformation is kept bounded below a small constant independent of the size of the linear transformation even when all logic gates in the computation are noisy. Further, we show that the scheme requires fewer operations (in order sense) than its uncoded counterpart. By deriving a lower bound, we show that in some cases, the scheme is order-optimal. Using these results, we examine the gain in energy-efficiency from use of voltage-scaling scheme where gate-energy is reduced by lowering the supply voltage. We use a gate energy-reliability model to show that tuning gate-energy appropriately at different stages of the computation (dynamic voltage scaling), in conjunction with ENCODED, can lead to order-sense energy-savings over the classical uncoded approach. Finally, we also examine the problem of computing a linear transformation when noiseless decoders can be used, providing upper and lower bounds to the problem.
We consider the problem of duplicate detection in noisy and incomplete data: given a large data set in which each record has multiple entries (attributes), detect which distinct records refer to the same real world entity. This task is complicated by noise (such as misspellings) and missing data, which can lead to records being different, despite referring to the same entity. Our method consists of three main steps: creating a similarity score between records, grouping records together into unique entities, and refining the groups. We compare various methods for creating similarity scores between noisy records, considering different combinations of string matching, term frequency-inverse document frequency methods, and n-gram techniques. In particular, we introduce a vectorized soft term frequency-inverse document frequency method, with an optional refinement step. We also discuss two methods to deal with missing data in computing similarity scores. We test our method on the Los Angeles Police Department Field Interview Card data set, the Cora Citation Matching data set, and two sets of restaurant review data. The results show that the methods that use words as the basic units are preferable to those that use 3-grams. Moreover, in some (but certainly not all) parameter ranges soft term frequency-inverse document frequency methods can outperform the standard term frequency-inverse document frequency method. The results also confirm that our method for automatically determining the number of groups typically works well in many cases and allows for accurate results in the absence of a priori knowledge of the number of unique entities in the data set.
Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty estimates. We introduce a fully Bayesian approach, inferring a full posterior over structures, which more reliably captures the uncertainty of the model.