No Arabic abstract
Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.
We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents an opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifolds. Our developments nicely complement existing methods for this problem which either require $O(d^3)$ time complexity per iteration with $O(frac{1}{sqrt{t}})$ convergence rate (where $d$ is the dimensionality) or only extract the top $1$ component with $O(frac{1}{t})$ convergence rate. In contrast, our algorithm offers a strict improvement for this classical problem: it achieves $O(d^2k)$ runtime complexity per iteration for extracting the top $k$ canonical components with $O(frac{1}{t})$ convergence rate. While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising. We also explore a potential application in training fair models where the label of protected attribute is missing or otherwise unavailable.
We study the statistical and computational aspects of kernel principal component analysis using random Fourier features and show that under mild assumptions, $O(sqrt{n} log n)$ features suffices to achieve $O(1/epsilon^2)$ sample complexity. Furthermore, we give a memory efficient streaming algorithm based on classical Ojas algorithm that achieves this rate.
Autonomous and semi-autonomous systems for safety-critical applications require rigorous testing before deployment. Due to the complexity of these systems, formal verification may be impossible and real-world testing may be dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a black box during testing. Safety validation tasks include finding disturbances to the system that cause it to fail (falsification), finding the most-likely failure, and estimating the probability that the system fails. Motivated by the prevalence of safety-critical artificial intelligence, this work provides a survey of state-of-the-art safety validation techniques with a focus on applied algorithms and their modifications for the safety validation problem. We present and discuss algorithms in the domains of optimization, path planning, reinforcement learning, and importance sampling. Problem decomposition techniques are presented to help scale algorithms to large state spaces, and a brief overview of safety-critical applications is given, including autonomous vehicles and aircraft collision avoidance systems. Finally, we present a survey of existing academic and commercially available safety validation tools.
We allow for nonlinear effects in the likelihood analysis of peculiar velocities, and obtain ~35%-lower values for the cosmological density parameter and for the amplitude of mass-density fluctuations. The power spectrum in the linear regime is assumed to be of the flat LCDM model (h=0.65, n=1) with only Om_m free. Since the likelihood is driven by the nonlinear regime, we break the power spectrum at k_b=0.2 h/Mpc and fit a two-parameter power-law at k>k_b. This allows for an unbiased fit in the linear regime. Tests using improved mock catalogs demonstrate a reduced bias and a better fit. We find for the Mark III and SFI data Om_m=0.35+-0.09$ with sigma_8*Om_m^0.6=0.55+-0.10 (90% errors). When allowing deviations from lcdm, we find an indication for a wiggle in the power spectrum in the form of an excess near k~0.05 and a deficiency at k~0.1 h/Mpc --- a cold flow which may be related to a feature indicated from redshift surveys and the second peak in the CMB anisotropy. A chi^2 test applied to principal modes demonstrates that the nonlinear procedure improves the goodness of fit. The Principal Component Analysis (PCA) helps identifying spatial features of the data and fine-tuning the theoretical and error models. We address the potential for optimal data compression using PCA.
Numerous Bayesian Network (BN) structure learning algorithms have been proposed in the literature over the past few decades. Each publication makes an empirical or theoretical case for the algorithm proposed in that publication and results across studies are often inconsistent in their claims about which algorithm is best. This is partly because there is no agreed evaluation approach to determine their effectiveness. Moreover, each algorithm is based on a set of assumptions, such as complete data and causal sufficiency, and tend to be evaluated with data that conforms to these assumptions, however unrealistic these assumptions may be in the real world. As a result, it is widely accepted that synthetic performance overestimates real performance, although to what degree this may happen remains unknown. This paper investigates the performance of 15 structure learning algorithms. We propose a methodology that applies the algorithms to data that incorporates synthetic noise, in an effort to better understand the performance of structure learning algorithms when applied to real data. Each algorithm is tested over multiple case studies, sample sizes, types of noise, and assessed with multiple evaluation criteria. This work involved approximately 10,000 graphs with a total structure learning runtime of seven months. It provides the first large-scale empirical validation of BN structure learning algorithms under different assumptions of data noise. The results suggest that traditional synthetic performance may overestimate real-world performance by anywhere between 10% and more than 50%. They also show that while score-based learning is generally superior to constraint-based learning, a higher fitting score does not necessarily imply a more accurate causal graph. To facilitate comparisons with future studies, we have made all data, raw results, graphs and BN models freely available online.