No Arabic abstract
In this paper we present novel algorithms for several multidimensional data processing problems. We consider problems related to the computation of restricted clusters and of the diameter of a set of points using a new distance function. We also consider two string (1D data) processing problems, regarding an optimal encoding method and the computation of the number of occurrences of a substring within a string generated by a grammar. The algorithms have been thoroughly analyzed from a theoretical point of view and some of them have also been evaluated experimentally.
We consider the problem of computing the diameter of a unicycle graph (i.e., a graph with a unique cycle). We present an O(n) time algorithm for the problem, where n is the number of vertices of the graph. This improves the previous best O(n log n) time solution [Oh and Ahn, ISAAC 2016]. Using this algorithm as a subroutine, we solve the problem of adding a shortcut to a tree so that the diameter of the new graph (which is a unicycle graph) is minimized; our algorithm takes O(n^2 log n) time and O(n) space. The previous best algorithms solve the problem in O(n^2 log^3 n) time and O(n) space [Oh and Ahn, ISAAC 2016], or in O(n^2) time and O(n^2) space [Bil`o, ISAAC 2018].
The diameter, radius and eccentricities are natural graph parameters. While these problems have been studied extensively, there are no known dynamic algorithms for them beyond the ones that follow from trivial recomputation after each update or from solving dynamic All-Pairs Shortest Paths (APSP), which is very computationally intensive. This is the situation for dynamic approximation algorithms as well, and even if only edge insertions or edge deletions need to be supported. This paper provides a comprehensive study of the dynamic approximation of Diameter, Radius and Eccentricities, providing both conditional lower bounds, and new algorithms whose bounds are optimal under popular hypotheses in fine-grained complexity. Some of the highlights include: - Under popular hardness hypotheses, there can be no significantly better fully dynamic approximation algorithms than recomputing the answer after each update, or maintaining full APSP. - Nearly optimal partially dynamic (incremental/decremental) algorithms can be achieved via efficient reductions to (incremental/decremental) maintenance of Single-Source Shortest Paths. For instance, a nearly $(3/2+epsilon)$-approximation to Diameter in directed or undirected graphs can be maintained decrementally in total time $m^{1+o(1)}sqrt{n}/epsilon^2$. This nearly matches the static $3/2$-approximation algorithm for the problem that is known to be conditionally optimal.
Hierarchical clustering is a fundamental task often used to discover meaningful structures in data, such as phylogenetic trees, taxonomies of concepts, subtypes of cancer, and cascades of particle decays in particle physics. Typically approximate algorithms are used for inference due to the combinatorial number of possible hierarchical clusterings. In contrast to existing methods, we present novel dynamic-programming algorithms for emph{exact} inference in hierarchical clustering based on a novel trellis data structure, and we prove that we can exactly compute the partition function, maximum likelihood hierarchy, and marginal probabilities of sub-hierarchies and clusters. Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies. Also, for larger datasets where our exact algorithms become infeasible, we introduce an approximate algorithm based on a sparse trellis that compares well to other benchmarks. Exact methods are relevant to data analyses in particle physics and for finding correlations among gene expression in cancer genomics, and we give examples in both areas, where our algorithms outperform greedy and beam search baselines. In addition, we consider Dasguptas cost with synthetic data.
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.
We investigate algorithmic control of a large swarm of mobile particles (such as robots, sensors, or building material) that move in a 2D workspace using a global input signal (such as gravity or a magnetic field). We show that a maze of obstacles to the environment can be used to create complex systems. We provide a wide range of results for a wide range of questions. These can be subdivided into external algorithmic problems, in which particle configurations serve as input for computations that are performed elsewhere, and internal logic problems, in which the particle configurations themselves are used for carrying out computations. For external algorithms, we give both negative and positive results. If we are given a set of stationary obstacles, we prove that it is NP-hard to decide whether a given initial configuration of unit-sized particles can be transformed into a desired target configuration. Moreover, we show that finding a control sequence of minimum length is PSPACE-complete. We also work on the inverse problem, providing constructive algorithms to design workspaces that efficiently implement arbitrary permutations between different configurations. For internal logic, we investigate how arbitrary computations can be implemented. We demonstrate how to encode dual-rail logic to build a universal logic gate that concurrently evaluates and, nand, nor, and or operations. Using many of these gates and appropriate interconnects, we can evaluate any logical expression. However, we establish that simulating the full range of complex interactions present in arbitrary digital circuits encounters a fundamental difficulty: a fan-out gate cannot be generated. We resolve this missing component with the help of 2x1 particles, which can create fan-out gates that produce multiple copies of the inputs. Using these gates we provide rules for replicating arbitrary digital circuits.