Approximate Nearest Neighbors in the Space of Persistence Diagrams

60 0 0.0 ( 0 )

Download Cite

Added by Brittany Terese Fasy

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Brittany Terese Fasy - Xiaozhou He - Zhihui Liu

Computational Geometry

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Persistence diagrams are important tools in the field of topological data analysis that describe the presence and magnitude of features in a filtered topological space. However, current approaches for comparing a persistence diagram to a set of other persistence diagrams is linear in the number of diagrams or do not offer performance guarantees. In this paper, we apply concepts from locality-sensitive hashing to support approximate nearest neighbor search in the space of persistence diagrams. Given a set $Gamma$ of $n$ $(M,m)$-bounded persistence diagrams, each with at most $m$ points, we snap-round the points of each diagram to points on a cubical lattice and produce a key for each possible snap-rounding. Specifically, we fix a grid over each diagram at several resolutions and consider the snap-roundings of each diagram to the four nearest lattice points. Then, we propose a data structure with $tau$ levels $mathbb{D}_{tau}$ that stores all snap-roundings of each persistence diagram in $Gamma$ at each resolution. This data structure has size $O(n5^mtau)$ to account for varying lattice resolutions as well as snap-roundings and the deletion of points with low persistence. To search for a persistence diagram, we compute a key for a query diagram by snapping each point to a lattice and deleting points of low persistence. Furthermore, as the lattice parameter decreases, searching our data structure yields a six-approximation of the nearest diagram in $Gamma$ in $O((mlog{n}+m^2)logtau)$ time and a constant factor approximation of the $k$th nearest diagram in $O((mlog{n}+m^2+k)logtau)$ time.

rate research

Sketching Persistence Diagrams

104 - Donald R. Sheehy , Siddharth Sheth 2020

Given a persistence diagram with $n$ points, we give an algorithm that produces a sequence of $n$ persistence diagrams converging in bottleneck distance to the input diagram, the $i$th of which has $i$ distinct (weighted) points and is a $2$-approximation to the closest persistence diagram with that many distinct points. For each approximation, we precompute the optimal matching between the $i$th and the $(i+1)$st. Perhaps surprisingly, the entire sequence of diagrams as well as the sequence of matchings can be represented in $O(n)$ space. The main approach is to use a variation of the greedy permutation of the persistence diagram to give good Hausdorff approximations and assign weights to these subsets. We give a new algorithm to efficiently compute this permutation, despite the high implicit dimension of points in a persistence diagram due to the effect of the diagonal. The sketches are also structured to permit fast (linear time) approximations to the Hausdorff distance between diagrams -- a lower bound on the bottleneck distance. For approximating the bottleneck distance, sketches can also be used to compute a linear-size neighborhood graph directly, obviating the need for geometric data structures used in state-of-the-art methods for bottleneck computation.

Computational Geometry

Approximate nearest neighbors search without false negatives for $l_2$ for $c>sqrt{loglog{n}}$

123 - Piotr Sankowski , Piotr Wygocki 2017

In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest neighbors problem without false negatives for Euclidean high dimensional space $mathcal{R}^d$. These data structures work for any $c = omega(sqrt{log{log{n}}})$, where $n$ is the number of points in the input set, with poly-logarithmic query time and polynomial preprocessing time. This improves over the known algorithms, which require $c$ to be $Omega(sqrt{d})$. This improvement is obtained by applying a sequence of reductions, which are interesting on their own. First, we reduce the problem to $d$ instances of dimension logarithmic in $n$. Next, these instances are reduced to a number of $c$-approximate nearest neighbor search instances in $big(mathbb{R}^kbig)^L$ space equipped with metric $m(x,y) = max_{1 le i le L}(lVert x_i - y_irVert_2)$.

Computational Geometry Data Structures and Algorithms

Understanding the Topology and the Geometry of the Space of Persistence Diagrams via Optimal Partial Transport

184 - Vincent Divol 2019

Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g. persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the persistence diagrams space. We explore topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, a geometric description of barycenters (Frechet means) for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the strength of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism.

Computational Geometry Geometric Topology

Realizable piecewise linear paths of persistence diagrams with Reeb graphs

129 - Rehab Alharbi , Erin Wolf Chambers , Elizabeth Munch 2021

Reeb graphs are widely used in a range of fields for the purposes of analyzing and comparing complex spaces via a simpler combinatorial object. Further, they are closely related to extended persistence diagrams, which largely but not completely encode the information of the Reeb graph. In this paper, we investigate the effect on the persistence diagram of a particular continuous operation on Reeb graphs; namely the (truncated) smoothing operation. This construction arises in the context of the Reeb graph interleaving distance, but separately from that viewpoint provides a simplification of the Reeb graph which continuously shrinks small loops. We then use this characterization to initiate the study of inverse problems for Reeb graphs using smoothing by showing which paths in persistence diagram space (commonly known as vineyards) can be realized by a path in the space of Reeb graphs via these simple operations. This allows us to solve the inverse problem on a certain family of piecewise linear vineyards when fixing an initial Reeb graph.

Computational Geometry

Approximation algorithms for 1-Wasserstein distance between persistence diagrams

91 - Samantha Chen , Yusu Wang 2021

Recent years have witnessed a tremendous growth using topological summaries, especially the persistence diagrams (encoding the so-called persistent homology) for analyzing complex shapes. Intuitively, persistent homology maps a potentially complex input object (be it a graph, an image, or a point set and so on) to a unified type of feature summary, called the persistence diagrams. One can then carry out downstream data analysis tasks using such persistence diagram representations. A key problem is to compute the distance between two persistence diagrams efficiently. In particular, a persistence diagram is essentially a multiset of points in the plane, and one popular distance is the so-called 1-Wasserstein distance between persistence diagrams. In this paper, we present two algorithms to approximate the 1-Wasserstein distance for persistence diagrams in near-linear time. These algorithms primarily follow the same ideas as two existing algorithms to approximate optimal transport between two finite point-sets in Euclidean spaces via randomly shifted quadtrees. We show how these algorithms can be effectively adapted for the case of persistence diagrams. Our algorithms are much more efficient than previous exact and approximate algorithms, both in theory and in practice, and we demonstrate its efficiency via extensive experiments. They are conceptually simple and easy to implement, and the code is publicly available in github.

Computational Geometry Data Structures and Algorithms