No Arabic abstract
Independent component analysis (ICA) has been widely used for blind source separation in many fields such as brain imaging analysis, signal processing and telecommunication. Many statistical techniques based on M-estimates have been proposed for estimating the mixing matrix. Recently, several nonparametric methods have been developed, but in-depth analysis of asymptotic efficiency has not been available. We analyze ICA using semiparametric theories and propose a straightforward estimate based on the efficient score function by using B-spline approximations. The estimate is asymptotically efficient under moderate conditions and exhibits better performance than standard ICA methods in a variety of simulations.
Compositional data represent a specific family of multivariate data, where the information of interest is contained in the ratios between parts rather than in absolute values of single parts. The analysis of such specific data is challenging as the application of standard multivariate analysis tools on the raw observations can lead to spurious results. Hence, it is appropriate to apply certain transformations prior further analysis. One popular multivariate data analysis tool is independent component analysis. Independent component analysis aims to find statistically independent components in the data and as such might be seen as an extension to principal component analysis. In this paper we examine an approach of how to apply independent component analysis on compositional data by respecting the nature of the former and demonstrate the usefulness of this procedure on a metabolomic data set.
Independent component analysis (ICA) has been shown to be useful in many applications. However, most ICA methods are sensitive to data contamination and outliers. In this article we introduce a general minimum U-divergence framework for ICA, which covers some standard ICA methods as special cases. Within the U-family we further focus on the gamma-divergence due to its desirable property of super robustness, which gives the proposed method gamma-ICA. Statistical properties and technical conditions for the consistency of gamma-ICA are rigorously studied. In the limiting case, it leads to a necessary and sufficient condition for the consistency of MLE-ICA. This necessary and sufficient condition is weaker than the condition known in the literature. Since the parameter of interest in ICA is an orthogonal matrix, a geometrical algorithm based on gradient flows on special orthogonal group is introduced to implement gamma-ICA. Furthermore, a data-driven selection for the gamma value, which is critical to the achievement of gamma-ICA, is developed. The performance, especially the robustness, of gamma-ICA in comparison with standard ICA methods is demonstrated through experimental studies using simulated data and image data.
Bayesian nonparametric priors based on completely random measures (CRMs) offer a flexible modeling approach when the number of latent components in a dataset is unknown. However, managing the infinite dimensionality of CRMs typically requires practitioners to derive ad-hoc algorithms, preventing the use of general-purpose inference methods and often leading to long compute times. We propose a general but explicit recipe to construct a simple finite-dimensional approximation that can replace the infinite-dimensional CRMs. Our independent finite approximation (IFA) is a generalization of important cases that are used in practice. The independence of atom weights in our approximation (i) makes the construction well-suited for parallel and distributed computation and (ii) facilitates more convenient inference schemes. We quantify the approximation error between IFAs and the target nonparametric prior. We compare IFAs with an alternative approximation scheme -- truncated finite approximations (TFAs), where the atom weights are constructed sequentially. We prove that, for worst-case choices of observation likelihoods, TFAs are a more efficient approximation than IFAs. However, in real-data experiments with image denoising and topic modeling, we find that IFAs perform very similarly to TFAs in terms of task-specific accuracy metrics.
We propose a nonparametric method to explicitly model and represent the derivatives of smooth underlying trajectories for longitudinal data. This representation is based on a direct Karhunen--Lo`eve expansion of the unobserved derivatives and leads to the notion of derivative principal component analysis, which complements functional principal component analysis, one of the most popular tools of functional data analysis. The proposed derivative principal component scores can be obtained for irregularly spaced and sparsely observed longitudinal data, as typically encountered in biomedical studies, as well as for functional data which are densely measured. Novel consistency results and asymptotic convergence rates for the proposed estimates of the derivative principal component scores and other components of the model are derived under a unified scheme for sparse or dense observations and mild conditions. We compare the proposed representations for derivatives with alternative approaches in simulation settings and also in a wallaby growth curve application. It emerges that representations using the proposed derivative principal component analysis recover the underlying derivatives more accurately compared to principal component analysis-based approaches especially in settings where the functional data are represented with only a very small number of components or are densely sampled. In a second wheat spectra classification example, derivative principal component scores were found to be more predictive for the protein content of wheat than the conventional functional principal component scores.
Independent component analysis (ICA) has been a popular dimension reduction tool in statistical machine learning and signal processing. In this paper, we present a convergence analysis for an online tensorial ICA algorithm, by viewing the problem as a nonconvex stochastic approximation problem. For estimating one component, we provide a dynamics-based analysis to prove that our online tensorial ICA algorithm with a specific choice of stepsize achieves a sharp finite-sample error bound. In particular, under a mild assumption on the data-generating distribution and a scaling condition such that $d^4/T$ is sufficiently small up to a polylogarithmic factor of data dimension $d$ and sample size $T$, a sharp finite-sample error bound of $tilde{O}(sqrt{d/T})$ can be obtained.