No Arabic abstract
Astrophysicists are interested in recovering the 3D gas emissivity of a galaxy cluster from a 2D image taken by a telescope. A blurring phenomenon and presence of point sources make this inverse problem even harder to solve. The current state-of-the-art technique is two step: first identify the location of potential point sources, then mask these locations and deproject the data. We instead model the data as a Poisson generalized linear model (involving blurring, Abel and wavelets operators) regularized by two lasso penalties to induce sparse wavelet representation and sparse point sources. The amount of sparsity is controlled by two quantile universal thresholds. As a result, our method outperforms the existing one.
Cryo-electron microscopy (cryo-EM) is an emerging experimental method to characterize the structure of large biomolecular assemblies. Single particle cryo-EM records 2D images (so-called micrographs) of projections of the three-dimensional particle, which need to be processed to obtain the three-dimensional reconstruction. A crucial step in the reconstruction process is particle picking which involves detection of particles in noisy 2D micrographs with low signal-to-noise ratios of typically 1:10 or even lower. Typically, each picture contains a large number of particles, and particles have unknown irregular and nonconvex shapes.
We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology [Descloux and Sardy, 2018], initially introduced for sparse linear models, to the sparse corruptions problem. We give theoretical guarantees on the sign recovery of the parameters for a slightly simplified version of the estimator, called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is showcased for variable selection with missing values in the covariates. In addition to not requiring the specification of a model for the covariates, nor estimating their covariance matrix or the noise variance, the method has the great advantage of handling missing not-at random values without specifying a parametric model. Numerical experiments and a medical application underline the relevance of Robust Lasso-Zero in such a context with few available competitors. The method is easy to use and implemented in the R library lass0.
Microbiome data analyses require statistical models that can simultaneously decode microbes reactions to the environment and interactions among microbes. While a multiresponse linear regression model seems like a straightforward solution, we argue that treating it as a graphical model is flawed given that the regression coefficient matrix does not encode the conditional dependence structure between response and predictor nodes because it does not represent the adjacency matrix. This observation is especially important in biological settings when we have prior knowledge on the edges from specific experimental interventions that can only be properly encoded under a conditional dependence model. Here, we propose a chain graph model with two sets of nodes (predictors and responses) whose solution yields a graph with edges that indeed represent conditional dependence and thus, agrees with the experimenters intuition on the average behavior of nodes under treatment. The solution to our model is sparse via Bayesian LASSO and is also guaranteed to be the sparse solution to a Conditional Auto-Regressive (CAR) model. In addition, we propose an adaptive extension so that different shrinkage can be applied to different edges to incorporate edge-specific prior knowledge. Our model is computationally inexpensive through an efficient Gibbs sampling algorithm and can account for binary, counting, and compositional responses via appropriate hierarchical structure. We apply our model to a human gut and a soil microbial compositional datasets and we highlight that CAR-LASSO can estimate biologically meaningful network structures in the data. The CAR-LASSO software is available as an R package at https://github.com/YunyiShen/CAR-LASSO.
The aim of online monitoring is to issue an alarm as soon as there is significant evidence in the collected observations to suggest that the underlying data generating mechanism has changed. This work is concerned with open-end, nonparametric procedures that can be interpreted as statistical tests. The proposed monitoring schemes consist of computing the so-called retrospective CUSUM statistic (or minor variations thereof) after the arrival of each new observation. After proposing suitable threshold functions for the chosen detectors, the asymptotic validity of the procedures is investigated in the special case of monitoring for changes in the mean, both under the null hypothesis of stationarity and relevant alternatives. To carry out the sequential tests in practice, an approach based on an asymptotic regression model is used to estimate high quantiles of relevant limiting distributions. Monte Carlo experiments demonstrate the good finite-sample behavior of the proposed monitoring schemes and suggest that they are superior to existing competitors as long as changes do not occur at the very beginning of the monitoring. Extensions to statistics exhibiting an asymptotic mean-like behavior are briefly discussed. Finally, the application of the derived sequential change-point detection tests is succinctly illustrated on temperature anomaly data.
We present $textbf{PyRMLE}$, a Python module that implements Regularized Maximum Likelihood Estimation for the analysis of Random Coefficient models. $textbf{PyRMLE}$ is simple to use and readily works with data formats that are typical to Random Coefficient problems. The module makes use of Pythons scientific libraries $textbf{NumPy}$ and $textbf{SciPy}$ for computational efficiency. The main implementation of the algorithm is executed purely in Python code which takes advantage of Pythons high-level features.