No Arabic abstract
We study whether and how can we model a joint distribution $p(x,z)$ using two conditional models $p(x|z)$ and $q(z|x)$ that form a cycle. This is motivated by the observation that deep generative models, in addition to a likelihood model $p(x|z)$, often also use an inference model $q(z|x)$ for data representation, but they rely on a usually uninformative prior distribution $p(z)$ to define a joint distribution, which may render problems like posterior collapse and manifold mismatch. To explore the possibility to model a joint distribution using only $p(x|z)$ and $q(z|x)$, we study their compatibility and determinacy, corresponding to the existence and uniqueness of a joint distribution whose conditional distributions coincide with them. We develop a general theory for novel and operable equivalence criteria for compatibility, and sufficient conditions for determinacy. Based on the theory, we propose the CyGen framework for cyclic-conditional generative modeling, including methods to enforce compatibility and use the determined distribution to fit and generate data. With the prior constraint removed, CyGen better fits data and captures more representative features, supported by experiments showing better generation and downstream classification performance.
Let $k,l,m,n$, and $mu$ be positive integers. A $mathbb{Z}_mu$--{it scheme of valency} $(k,l)$ and {it order} $(m,n)$ is a $m times n$ array $(S_{ij})$ of subsets $S_{ij} subseteq mathbb{Z}_mu$ such that for each row and column one has $sum_{j=1}^n |S_{ij}| = k $ and $sum_{i=1}^m |S_{ij}| = l$, respectively. Any such scheme is an algebraic equivalent of a $(k,l)$-semi-regular bipartite voltage graph with $n$ and $m$ vertices in the bipartition sets and voltages coming from the cyclic group $mathbb{Z}_mu$. We are interested in the subclass of $mathbb{Z}_mu$--schemes that are characterized by the property $a - b + c - d; ot equiv ;0$ (mod $mu$) for all $a in S_{ij}$, $b in S_{ih}$, $c in S_{gh}$, and $d in S_{gj}$ where $i,g in {1,...,m}$ and $j,h in {1,...,n}$ need not be distinct. These $mathbb{Z}_mu$--schemes can be used to represent adjacency matrices of regular graphs of girth $ge 5$ and semi-regular bipartite graphs of girth $ge 6$. For suitable $rho, sigma in mathbb{N}$ with $rho k = sigma l$, they also represent incidence matrices for polycyclic $(rho mu_k, sigma mu_l)$ configurations and, in particular, for all known Desarguesian elliptic semiplanes. Partial projective closures yield {it mixed $mathbb{Z}_mu$-schemes}, which allow new constructions for Krv{c}adinacs sporadic configuration of type $(34_6)$ and Balbuenas bipartite $(q-1)$-regular graphs of girth 6 on as few as $2(q^2-q-2)$ vertices, with $q$ ranging over prime powers. Besides some new results, this survey essentially furnishes new proofs in terms of (mixed) $mathbb{Z}_mu$--schemes for ad-hoc constructions used thus far.
While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the humans gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them. Code is available at https://github.com/HumanCompatibleAI/overcooked_ai.
This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.
A common assumption in causal modeling posits that the data is generated by a set of independent mechanisms, and algorithms should aim to recover this structure. Standard unsupervised learning, however, is often concerned with training a single model to capture the overall distribution or aspects thereof. Inspired by clustering approaches, we consider mixtures of implicit generative models that ``disentangle the independent generative mechanisms underlying the data. Relying on an additional set of discriminators, we propose a competitive training procedure in which the models only need to capture the portion of the data distribution from which they can produce realistic samples. As a by-product, each model is simpler and faster to train. We empirically show that our approach splits the training distribution in a sensible way and increases the quality of the generated samples.
A central challenge faced by memory systems is the robust retrieval of a stored pattern in the presence of interference due to other stored patterns and noise. A theoretically well-founded solution to robust retrieval is given by attractor dynamics, which iteratively clean up patterns during recall. However, incorporating attractor dynamics into modern deep learning systems poses difficulties: attractor basins are characterised by vanishing gradients, which are known to make training neural networks difficult. In this work, we avoid the vanishing gradient problem by training a generative distributed memory without simulating the attractor dynamics. Based on the idea of memory writing as inference, as proposed in the Kanerva Machine, we show that a likelihood-based Lyapunov function emerges from maximising the variational lower-bound of a generative memory. Experiments shows it converges to correct patterns upon iterative retrieval and achieves competitive performance as both a memory model and a generative model.