No Arabic abstract
Multivariate time series (MTS) data such as time course gene expression data in genomics are often collected to study the dynamic nature of the systems. These data provide important information about the causal dependency among a set of random variables. In this paper, we introduce a computationally efficient algorithm to learn directed acyclic graphs (DAGs) based on MTS data, focusing on learning the local structure of a given target variable. Our algorithm is based on learning all parents (P), all children (C) and some descendants (D) (PCD) iteratively, utilizing the time order of the variables to orient the edges. This time series PCD-PCD algorithm (tsPCD-PCD) extends the previous PCD-PCD algorithm to dependent observations and utilizes composite likelihood ratio tests (CLRTs) for testing the conditional independence. We present the asymptotic distribution of the CLRT statistic and show that the tsPCD-PCD is guaranteed to recover the true DAG structure when the faithfulness condition holds and the tests correctly reject the null hypotheses. Simulation studies show that the CLRTs are valid and perform well even when the sample sizes are small. In addition, the tsPCD-PCD algorithm outperforms the PCD-PCD algorithm in recovering the local graph structures. We illustrate the algorithm by analyzing a time course gene expression data related to mouse T-cell activation.
We introduce a structure for the directed acyclic graph (DAG) and a mechanism design based on that structure so that peers can reach consensus at large scale based on proof of work (PoW). We also design a mempool transaction assignment method based on the DAG structure to render negligible the probability that a transaction being processed by more than one miners. The result is a significant scale-up of the capacity without sacrificing security and decentralization.
We develop a Bregman proximal gradient method for structure learning on linear structural causal models. While the problem is non-convex, has high curvature and is in fact NP-hard, Bregman gradient methods allow us to neutralize at least part of the impact of curvature by measuring smoothness against a highly nonlinear kernel. This allows the method to make longer steps and significantly improves convergence. Each iteration requires solving a Bregman proximal step which is convex and efficiently solvable for our particular choice of kernel. We test our method on various synthetic and real data sets.
Missing data is a pervasive problem in data analyses, resulting in datasets that contain censored realizations of a target distribution. Many approaches to inference on the target distribution using censored observed data, rely on missing data models represented as a factorization with respect to a directed acyclic graph. In this paper we consider the identifiability of the target distribution within this class of models, and show that the most general identification strategies proposed so far retain a significant gap in that they fail to identify a wide class of identifiable distributions. To address this gap, we propose a new algorithm that significantly generalizes the types of manipulations used in the ID algorithm, developed in the context of causal inference, in order to obtain identification.
The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation sequencing technologies in bioinformatics. The first problem (MinPCRP), given a DAG and a set of pairs of vertices, asks for a minimum cardinality set of paths covering all the vertices such that both vertices of each pair belong to the same path. For this problem, we show that, while it is NP-hard to compute if there exists a solution consisting of at most three paths, it is possible to decide in polynomial time whether a solution consisting of at most two paths exists. The second problem (MaxRPSP), given a DAG and a set of pairs of vertices, asks for a path containing the maximum number of the given pairs of vertices. We show its NP-hardness and also its W[1]-hardness when parametrized by the number of covered pairs. On the positive side, we give a fixed-parameter algorithm when the parameter is the maximum overlapping degree, a natural parameter in the bioinformatics applications of the problem.
Let $G$ be a graph and $S, T subseteq V(G)$ be (possibly overlapping) sets of terminals, $|S|=|T|=k$. We are interested in computing a vertex sparsifier for terminal cuts in $G$, i.e., a graph $H$ on a smallest possible number of vertices, where $S cup T subseteq V(H)$ and such that for every $A subseteq S$ and $B subseteq T$ the size of a minimum $(A,B)$-vertex cut is the same in $G$ as in $H$. We assume that our graphs are unweighted and that terminals may be part of the min-cut. In previous work, Kratsch and Wahlstrom (FOCS 2012/JACM 2020) used connections to matroid theory to show that a vertex sparsifier $H$ with $O(k^3)$ vertices can be computed in randomized polynomial time, even for arbitrary digraphs $G$. However, since then, no improvements on the size $O(k^3)$ have been shown. In this paper, we draw inspiration from the renowned Bollobass Two-Families Theorem in extremal combinatorics and introduce the use of total orderings into Kratsch and Wahlstroms methods. This new perspective allows us to construct a sparsifier $H$ of $Theta(k^2)$ vertices for the case that $G$ is a DAG. We also show how to compute $H$ in time near-linear in the size of $G$, improving on the previous $O(n^{omega+1})$. Furthermore, $H$ recovers the closest min-cut in $G$ for every partition $(A,B)$, which was not previously known. Finally, we show that a sparsifier of size $Omega(k^2)$ is required, both for DAGs and for undirected edge cuts.