No Arabic abstract
This paper focusses on safe screening techniques for the LASSO problem. Motivated by the need for low-complexity algorithms, we propose a new approach, dubbed joint screening test, allowing to screen a set of atoms by carrying out one single test. The approach is particularized to two different sets of atoms, respectively expressed as sphere and dome regions. After presenting the mathematical derivations of the tests, we elaborate on their relative effectiveness and discuss the practical use of such procedures.
Classical methods for psychometric function estimation either require excessive measurements or produce only a low-resolution approximation of the target psychometric function. In this paper, we propose a novel solution for rapid screening for a change in the psychometric function estimation of a given patient. We use Bayesian active model selection to perform an automated pure-tone audiogram test with the goal of quickly finding if the current audiogram will be different from a previous audiogram. We validate our approach using audiometric data from the National Institute for Occupational Safety and Health NIOSH. Initial results show that with a few tones we can detect if the patients audiometric function has changed between the two test sessions with high confidence.
We consider learning an undirected graphical model from sparse data. While several efficient algorithms have been proposed for graphical lasso (GL), the alternating direction method of multipliers (ADMM) is the main approach taken concerning for joint graphical lasso (JGL). We propose proximal gradient procedures with and without a backtracking option for the JGL. These procedures are first-order and relatively simple, and the subproblems are solved efficiently in closed form. We further show the boundedness for the solution of the JGL problem and the iterations in the algorithms. The numerical results indicate that the proposed algorithms can achieve high accuracy and precision, and their efficiency is competitive with state-of-the-art algorithms.
We investigate the problem of testing whether $d$ random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two variable Hilbert-Schmidt independence criterion (HSIC) but allows for an arbitrary number of variables. We embed the $d$-dimensional joint distribution and the product of the marginals into a reproducing kernel Hilbert space and define the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) as the squared distance between the embeddings. In the population case, the value of dHSIC is zero if and only if the $d$ variables are jointly independent, as long as the kernel is characteristic. Based on an empirical estimate of dHSIC, we define three different non-parametric hypothesis tests: a permutation test, a bootstrap test and a test based on a Gamma approximation. We prove that the permutation test achieves the significance level and that the bootstrap test achieves pointwise asymptotic significance level as well as pointwise asymptotic consistency (i.e., it is able to detect any type of fixed dependence in the large sample limit). The Gamma approximation does not come with these guarantees; however, it is computationally very fast and for small $d$, it performs well in practice. Finally, we apply the test to a problem in causal discovery.
We propose a novel transfer learning approach for orphan screening called corresponding projections. In orphan screening the learning task is to predict the binding affinities of compounds to an orphan protein, i.e., one for which no training data is available. The identification of compounds with high affinity is a central concern in medicine since it can be used for drug discovery and design. Given a set of prediction models for proteins with labelled training data and a similarity between the proteins, corresponding projections constructs a model for the orphan protein from them such that the similarity between models resembles the one between proteins. Under the assumption that the similarity resemblance holds, we derive an efficient algorithm for kernel methods. We empirically show that the approach outperforms the state-of-the-art in orphan screening.
Using the $ell_1$-norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm, which is a convex surrogate of the rank, of the selected covariates as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net.