Do you want to publish a course? Click here

Recent Advances in Algorithmic High-Dimensional Robust Statistics

119   0   0.0 ( 0 )
 Added by Ilias Diakonikolas
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Learning in the presence of outliers is a fundamental problem in statistics. Until recently, all known efficient unsupervised learning algorithms were very sensitive to outliers in high dimensions. In particular, even for the task of robust mean estimation under natural distributional assumptions, no efficient algorithm was known. Recent work in theoretical computer science gave the first efficient robust estimators for a number of fundamental statistical tasks, including mean and covariance estimation. Since then, there has been a flurry of research activity on algorithmic high-dimensional robust estimation in a range of settings. In this survey article, we introduce the core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation. We also provide an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks and discuss new directions and opportunities for future work.



rate research

Read More

Random graph models are frequently used as a controllable and versatile data source for experimental campaigns in various research fields. Generating such data-sets at scale is a non-trivial task as it requires design decisions typically spanning multiple areas of expertise. Challenges begin with the identification of relevant domain-specific network features, continue with the question of how to compile such features into a tractable model, and culminate in algorithmic details arising while implementing the pertaining model. In the present survey, we explore crucial aspects of random graph models with known scalable generators. We begin by briefly introducing network features considered by such models, and then discuss random graphs alongside with generation algorithms. Our focus lies on modelling techniques and algorithmic primitives that have proven successful in obtaining massive graphs. We consider concepts and graph models for various domains (such as social network, infrastructure, ecology, and numerical simulations), and discuss generators for different models of computation (including shared-memory parallelism, massive-parallel GPUs, and distributed systems).
In recent years, significant advances have been made in the design and analysis of fully dynamic algorithms. However, these theoretical results have received very little attention from the practical perspective. Few of the algorithms are implemented and tested on real datasets, and their practical potential is far from understood. Here, we present a quick reference guide to recent engineering and theory results in the area of fully dynamic graph algorithms.
The notion of directed treewidth was introduced by Johnson, Robertson, Seymour and Thomas [Journal of Combinatorial Theory, Series B, Vol 82, 2001] as a first step towards an algorithmic metatheory for digraphs. They showed that some NP-complete properties such as Hamiltonicity can be decided in polynomial time on digraphs of constant directed treewidth. Nevertheless, despite more than one decade of intensive research, the list of hard combinatorial problems that are known to be solvable in polynomial time when restricted to digraphs of constant directed treewidth has remained scarce. In this work we enrich this list by providing for the first time an algorithmic metatheorem connecting the monadic second order logic of graphs to directed treewidth. We show that most of the known positive algorithmic results for digraphs of constant directed treewidth can be reformulated in terms of our metatheorem. Additionally, we show how to use our metatheorem to provide polynomial time algorithms for two classes of combinatorial problems that have not yet been studied in the context of directed width measures. More precisely, for each fixed $k,w in mathbb{N}$, we show how to count in polynomial time on digraphs of directed treewidth $w$, the number of minimum spanning strong subgraphs that are the union of $k$ directed paths, and the number of maximal subgraphs that are the union of $k$ directed paths and satisfy a given minor closed property. To prove our metatheorem we devise two technical tools which we believe to be of independent interest. First, we introduce the notion of tree-zig-zag number of a digraph, a new directed width measure that is at most a constant times directed treewidth. Second, we introduce the notion of $z$-saturated tree slice language, a new formalism for the specification and manipulation of infinite sets of digraphs.
Cut problems form one of the most fundamental classes of problems in algorithmic graph theory. For instance, the minimum cut, the minimum $s$-$t$ cut, the minimum multiway cut, and the minimum $k$-way cut are some of the commonly encountered cut problems. Many of these problems have been extensively studied over several decades. In this paper, we initiate the algorithmic study of some cut problems in high dimensions. The first problem we study, namely, Topological Hitting Set (THS), is defined as follows: Given a nontrivial $r$-cycle $zeta$ in a simplicial complex $mathsf{K}$, find a set $mathcal{S}$ of $r$-dimensional simplices of minimum cardinality so that $mathcal{S}$ meets every cycle homologous to $zeta$. Our main result is that this problem admits a polynomial-time solution on triangulations of closed surfaces. Interestingly, the optimal solution is given in terms of the cocycles of the surface. For general complexes, we show that THS is W[1]-hard with respect to the solution size $k$. On the positive side, we show that THS admits an FPT algorithm with respect to $k+d$, where $d$ is the maximum degree of the Hasse graph of the complex $mathsf{K}$. We also define a problem called Boundary Nontrivialization (BNT): Given a bounding $r$-cycle $zeta$ in a simplicial complex $mathsf{K}$, find a set $mathcal{S}$ of $(r+1)$-dimensional simplices of minimum cardinality so that the removal of $mathcal{S}$ from $mathsf{K}$ makes $zeta$ non-bounding. We show that BNT is W[1]-hard with respect to the solution size as the parameter, and has an $O(log n)$-approximation FPT algorithm for $(r+1)$-dimensional complexes with the $(r+1)$-th Betti number $beta_{r+1}$ as the parameter. Finally, we provide randomized (approximation) FPT algorithms for the global variants of THS and BNT.
Recently Ermon et al. (2013) pioneered a way to practically compute approximations to large scale counting or discrete integration problems by using random hashes. The hashes are used to reduce the counting problem into many separate discrete optimization problems. The optimization problems then can be solved by an NP-oracle such as commercial SAT solvers or integer linear programming (ILP) solvers. In particular, Ermon et al. showed that if the domain of integration is ${0,1}^n$ then it is possible to obtain a solution within a factor of $16$ of the optimal (a 16-approximation) by this technique. In many crucial counting tasks, such as computation of partition function of ferromagnetic Potts model, the domain of integration is naturally ${0,1,dots, q-1}^n, q>2$, the hypergrid. The straightforward extension of Ermon et al.s method allows a $q^2$-approximation for this problem. For large values of $q$, this is undesirable. In this paper, we show an improved technique to obtain an approximation factor of $4+O(1/q^2)$ to this problem. We are able to achieve this by using an idea of optimization over multiple bins of the hash functions, that can be easily implemented by inequality constraints, or even in unconstrained way. Also the burden on the NP-oracle is not increased by our method (an ILP solver can still be used). We provide experimental simulation results to support the theoretical guarantees of our algorithms.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا