أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Martin Grohe

Probabilistic Data with Continuous Distributions

91 - Martin Grohe , Benjamin Lucien Kaminski , Joost-Pieter Katoen andn Peter Lindner 2021

Statistical models of real world data typically involve continuous probability distributions such as normal, Laplace, or exponential distributions. Such distributions are supported by many probabilistic modelling formalisms, including probabilistic d atabase systems. Yet, the traditional theoretical framework of probabilistic databases focusses entirely on finite probabilistic databases. Only recently, we set out to develop the mathematical theory of infinite probabilistic databases. The present paper is an exposition of two recent papers which are cornerstones of this theory. In (Grohe, Lindner; ICDT 2020) we propose a very general framework for probabilistic databases, possibly involving continuous probability distributions, and show that queries have a well-defined semantics in this framework. In (Grohe, Kaminski, Katoen, Lindner; PODS 2020) we extend the declarative probabilistic programming language Generative Datalog, proposed by (Barany et al.~2017) for discrete probability distributions, to continuous probability distributions and show that such programs yield generative models of continuous probabilistic databases.

قواعد البيانات

Generative Datalog with Continuous Distributions

122 - Martin Grohe , Benjamin Lucien Kaminski , Joost-Pieter Katoen 2020

Arguing for the need to combine declarative and probabilistic programming, Barany et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a purely declarative probabilistic programming language. We revisit this language and pro pose a more principled approach towards defining its semantics based on stochastic kernels and Markov processes - standard notions from probability theory. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Barany et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.

قواعد البيانات المنطق في علوم الحاسوب

Linear Diophantine Equations, Group CSPs, and Graph Isomorphism

203 - Christoph Berkholz , Martin Grohe 2016

In recent years, we have seen several approaches to the graph isomorphism problem based on generic mathematical programming or algebraic (Grobner basis) techniques. For most of these, lower bounds have been established. In fact, it has been shown tha t the pairs of nonisomorphic CFI-graphs (introduced by Cai, Furer, and Immerman in 1992 as hard examples for the combinatorial Weisfeiler-Leman algorithm) cannot be distinguished by these mathematical algorithms. A notable exception were the algebraic algorithms over the field GF(2), for which no lower bound was known. Another, in some way even stronger, approach to graph isomorphism testing is based on solving systems of linear Diophantine equations (that is, linear equations over the integers), which is known to be possible in polynomial time. So far, no lower bounds for this approach were known. Lower bounds for the algebraic algorithms can best be proved in the framework of proof complexity, where they can be phrased as lower bounds for algebraic proof systems such as Nullstellensatz or the (more powerful) polynomial calculus. We give new hard examples for these systems: families of pairs of non-isomorphic graphs that are hard to distinguish by polynomial calculus proofs simultaneously over all prime fields, including GF(2), as well as examples that are hard to distinguish by the systems-of-linear-Diophantine-equations approach. In a previous paper, we observed that the CFI-graphs are closely related to what we call group CSPs: constraint satisfaction problems where the constraints are membership tests in some coset of a subgroup of a cartesian power of a base group (Z_2 in the case of the classical CFI-graphs). Our new examples are also based on group CSPs (for Abelian groups), but here we extend the CSPs by a few non-group constraints to obtain even harder instances for graph isomorphism.

التعقيد الحسابي المنطق في علوم الحاسوب

Tight Lower and Upper Bounds for the Complexity of Canonical Colour Refinement

96 - Christoph Berkholz , Paul Bonsma , Martin Grohe 2015

An assignment of colours to the vertices of a graph is stable if any two vertices of the same colour have identically coloured neighbourhoods. The goal of colour refinement is to find a stable colouring that uses a minimum number of colours. This is a widely used subroutine for graph isomorphism testing algorithms, since any automorphism needs to be colour preserving. We give an $O((m+n)log n)$ algorithm for finding a canonical version of such a stable colouring, on graphs with $n$ vertices and $m$ edges. We show that no faster algorithm is possible, under some modest assumptions about the type of algorithm, which captures all known colour refinement algorithms.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

Limitations of Algebraic Approaches to Graph Isomorphism Testing

127 - Christoph Berkholz , Martin Grohe 2015

We investigate the power of graph isomorphism algorithms based on algebraic reasoning techniques like Grobner basis computation. The idea of these algorithms is to encode two graphs into a system of equations that are satisfiable if and only if if th e graphs are isomorphic, and then to (try to) decide satisfiability of the system using, for example, the Grobner basis algorithm. In some cases this can be done in polynomial time, in particular, if the equations admit a bounded degree refutation in an algebraic proof systems such as Nullstellensatz or polynomial calculus. We prove linear lower bounds on the polynomial calculus degree over all fields of characteristic different from 2 and also linear lower bounds for the degree of Positivstellensatz calculus derivations. We compare this approach to recently studied linear and semidefinite programming approaches to isomorphism testing, which are known to be related to the combinatorial Weisfeiler-Lehman algorithm. We exactly characterise the power of the Weisfeiler-Lehman algorithm in terms of an algebraic proof system that lies between degree-k Nullstellensatz and degree-k polynomial calculus.

التعقيد الحسابي المنطق في علوم الحاسوب

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد