No Arabic abstract
Data clustering is a fundamental problem with a wide range of applications. Standard methods, eg the $k$-means method, usually require solving a non-convex optimization problem. Recently, total variation based convex relaxation to the $k$-means model has emerged as an attractive alternative for data clustering. However, the existing results on its exact clustering property, ie, the condition imposed on data so that the method can provably give correct identification of all cluster memberships, is only applicable to very specific data and is also much more restrictive than that of some other methods. This paper aims at the revisit of total variation based convex clustering, by proposing a weighted sum-of-$ell_1$-norm relating convex model. Its exact clustering property established in this paper, in both deterministic and probabilistic context, is applicable to general data and is much sharper than the existing results. These results provided good insights to advance the research on convex clustering. Moreover, the experiments also demonstrated that the proposed convex model has better empirical performance when be compared to standard clustering methods, and thus it can see its potential in practice.
Structured convex optimization on weighted graphs finds numerous applications in machine learning and computer vision. In this work, we propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class. Our preconditioner is driven by a sharp analysis of the local linear convergence rate depending on the active set at the current iterate. We show that nested-forest decomposition of the inactive edges yields a guaranteed local linear convergence rate. Further, we propose a practical greedy heuristic which realizes such nested decompositions and show in several numerical experiments that our reconditioning strategy, when applied to proximal gradient or primal-dual hybrid gradient algorithm, achieves competitive performances. Our results suggest that local convergence analysis can serve as a guideline for selecting variable metrics in proximal algorithms.
We consider total variation minimization for manifold valued data. We propose a cyclic proximal point algorithm and a parallel proximal point algorithm to minimize TV functionals with $ell^p$-type data terms in the manifold case. These algorithms are based on iterative geodesic averaging which makes them easily applicable to a large class of data manifolds. As an application, we consider denoising images which take their values in a manifold. We apply our algorithms to diffusion tensor images, interferometric SAR images as well as sphere and cylinder valued images. For the class of Cartan-Hadamard manifolds (which includes the data space in diffusion tensor imaging) we show the convergence of the proposed TV minimizing algorithms to a global minimizer.
We propose a new framework for deriving screening rules for convex optimization problems. Our approach covers a large class of constrained and penalized optimization formulations, and works in two steps. First, given any approximate point, the structure of the objective function and the duality gap is used to gather information on the optimal solution. In the second step, this information is used to produce screening rules, i.e. safely identifying unimportant weight variables of the optimal solution. Our general framework leads to a large variety of useful existing as well as new screening rules for many applications. For example, we provide new screening rules for general simplex and $L_1$-constrained problems, Elastic Net, squared-loss Support Vector Machines, minimum enclosing ball, as well as structured norm regularized problems, such as group lasso.
The aim of this paper is to address optimality of stochastic control strategies via dynamic programming subject to total variation distance ambiguity on the conditional distribution of the controlled process. We formulate the stochastic control problem using minimax theory, in which the control minimizes the pay-off while the conditional distribution, from the total variation distance set, maximizes it. First, we investigate the maximization of a linear functional on the space of probability measures on abstract spaces, among those probability measures which are within a total variation distance from a nominal probability measure, and then we give the maximizing probability measure in closed form. Second, we utilize the solution of the maximization to solve minimax stochastic control with deterministic control strategies, under a Markovian and a non-Markovian assumption, on the conditional distributions of the controlled process. The results of this part include: 1) Minimax optimization subject to total variation distance ambiguity constraint; 2) new dynamic programming recursions, which involve the oscillator seminorm of the value function, in addition to the standard terms; 3) new infinite horizon discounted dynamic programming equation, the associated contractive property, and a new policy iteration algorithm. Finally, we provide illustrative examples for both the finite and infinite horizon cases. For the infinite horizon case we invoke the new policy iteration algorithm to compute the optimal strategies.
A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a nonparametric model by the sum of univariate functions with respect to each explanatory variable, i.e., $f({mathbf x}) = sum f_j(x_j)$, where $x_jinmathbb{R}$ is $j$-th component of a sample ${mathbf x}in mathbb{R}^p$. In this paper, we introduce the total variation (TV) of a function as a measure of the complexity of functions in $L^1_{rm c}(mathbb{R})$-space. Our analysis shows that a GAM based on TV-regularization exhibits a Rademacher complexity of $O(sqrt{frac{log p}{m}})$, which is tight in terms of both $m$ and $p$ in the agnostic case of the classification problem. In result, we obtain generalization error bounds for finite samples according to work by Bartlett and Mandelson (2002).