No Arabic abstract
In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents decision-making processes, make such systems complex and interesting to study from a dynamical perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. However, in many settings, only sporadic observations of agents may be available in a given trajectory sequence. For instance, in football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses forward- and backward-information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We evaluate our approach on a dataset of football matches, using a projective camera module to train and evaluate our model for the off-screen player state estimation setting. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football.
Time series imputation is a fundamental task for understanding time series with missing data. Existing methods either do not directly handle irregularly-sampled data or degrade severely with sparsely observed data. In this work, we reformulate time series as permutation-equivariant sets and propose a novel imputation model NRTSI that does not impose any recurrent structures. Taking advantage of the permutation equivariant formulation, we design a principled and efficient hierarchical imputation procedure. In addition, NRTSI can directly handle irregularly-sampled time series, perform multiple-mode stochastic imputation, and handle data with partially observed dimensions. Empirically, we show that NRTSI achieves state-of-the-art performance across a wide range of time series imputation benchmarks.
Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Furthermore, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.
Multiagent reinforcement learning (MARL) is commonly considered to suffer from non-stationary environments and exponentially increasing policy space. It would be even more challenging when rewards are sparse and delayed over long trajectories. In this paper, we study hierarchical deep MARL in cooperative multiagent problems with sparse and delayed reward. With temporal abstraction, we decompose the problem into a hierarchy of different time scales and investigate how agents can learn high-level coordination based on the independent skills learned at the low level. Three hierarchical deep MARL architectures are proposed to learn hierarchical policies under different MARL paradigms. Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning. We empirically demonstrate the effectiveness of our approaches in two domains with extremely sparse feedback: (1) a variety of Multiagent Trash Collection tasks, and (2) a challenging online mobile game, i.e., Fever Basketball Defense.
Electronic health records (EHR) consist of longitudinal clinical observations portrayed with sparsity, irregularity, and high-dimensionality, which become major obstacles in drawing reliable downstream clinical outcomes. Although there exist great numbers of imputation methods to tackle these issues, most of them ignore correlated features, temporal dynamics and entirely set aside the uncertainty. Since the missing value estimates involve the risk of being inaccurate, it is appropriate for the method to handle the less certain information differently than the reliable data. In that regard, we can use the uncertainties in estimating the missing values as the fidelity score to be further utilized to alleviate the risk of biased missing value estimates. In this work, we propose a novel variational-recurrent imputation network, which unifies an imputation and a prediction network by taking into account the correlated features, temporal dynamics, as well as the uncertainty. Specifically, we leverage the deep generative model in the imputation, which is based on the distribution among variables, and a recurrent imputation network to exploit the temporal relations, in conjunction with utilization of the uncertainty. We validated the effectiveness of our proposed model on two publicly available real-world EHR datasets: PhysioNet Challenge 2012 and MIMIC-III, and compared the results with other competing state-of-the-art methods in the literature.
Learning when to communicate and doing that effectively is essential in multi-agent tasks. Recent works show that continuous communication allows efficient training with back-propagation in multi-agent scenarios, but have been restricted to fully-cooperative tasks. In this paper, we present Individualized Controlled Continuous Communication Model (IC3Net) which has better training efficiency than simple continuous communication model, and can be applied to semi-cooperative and competitive settings along with the cooperative settings. IC3Net controls continuous communication with a gating mechanism and uses individualized rewards foreach agent to gain better performance and scalability while fixing credit assignment issues. Using variety of tasks including StarCraft BroodWars explore and combat scenarios, we show that our network yields improved performance and convergence rates than the baselines as the scale increases. Our results convey that IC3Net agents learn when to communicate based on the scenario and profitability.