No Arabic abstract
Plant traits are a key to understanding and predicting the adaptation of ecosystems to environmental changes, which motivates the TRY project aiming at constructing a global database for plant traits and becoming a standard resource for the ecological community. Despite its unprecedented coverage, a large percentage of missing data substantially constrains joint trait analysis. Meanwhile, the trait data is characterized by the hierarchical phylogenetic structure of the plant kingdom. While factorization based matrix completion techniques have been widely used to address the missing data problem, traditional matrix factorization methods are unable to leverage the phylogenetic structure. We propose hierarchical probabilistic matrix factorization (HPMF), which effectively uses hierarchical phylogenetic information for trait prediction. We demonstrate HPMFs high accuracy, effectiveness of incorporating hierarchical structure and ability to capture trait correlation through experiments.
Multiresolution Matrix Factorization (MMF) was recently introduced as an alternative to the dominant low-rank paradigm in order to capture structure in matrices at multiple different scales. Using ideas from multiresolution analysis (MRA), MMF teased out hierarchical structure in symmetric matrices by constructing a sequence of wavelet bases. While effective for such matrices, there is plenty of data that is more naturally represented as nonsymmetric matrices (e.g. directed graphs), but nevertheless has similar hierarchical structure. In this paper, we explore techniques for extending MMF to any square matrix. We validate our approach on numerous matrix compression tasks, demonstrating its efficacy compared to low-rank methods. Moreover, we also show that a combined low-rank and MMF approach, which amounts to removing a small global-scale component of the matrix and then extracting hierarchical structure from the residual, is even more effective than each of the two complementary methods for matrix compression.
Traffic microscopic simulation applications are a common tool in road transportation analysis and several attempts to perform road safety assessments have recently been carried out. However, these approaches often ignore causal relationships between different levels of vehicle interactions and/or accident types and they lack a physical representation of the accident phenomena itself. In this paper, a new generic probabilistic safety assessment framework for traffic microscopic simulation tools is proposed. The probability of a specific accident occurring is estimated by an accident propensity function that consists of a deterministic safety score component and a random component. The formulation of the safety score depends on the type of occurrence, on detailed vehicle interactions and maneuvers and on its representation in a simulation environment. This generic model is applied to the case of an urban motorway and specified to four types of outcomes: non-accident events and three types of accidents in a nested structure: rear-end, lane-changing, and run-off-road accidents. The model was estimated and validated using simulated microscopic data. To obtained the consistent simulated data, a two-step simulation calibration procedure was adopted: (1) using real trajectories collected on site for detailed behavior representation; and (2) using aggregate data from each event used in safety model estimation. The final estimated safety model is able to identify and interpret several simulated vehicle interactions. The fact that these outcomes were extracted from simulated analysis shows the real potential of calibrated traffic microscopic simulation for detailed safety assessments.
Circadian clocks are oscillatory genetic networks that help organisms adapt to the 24-hour day/night cycle. The clock of the green alga Ostreococcus tauri is the simplest plant clock discovered so far. Its many advantages as an experimental system facilitate the testing of computational predictions. We present a model of the Ostreococcus clock in the stochastic process algebra Bio-PEPA and exploit its mapping to different analysis techniques, such as ordinary differential equations, stochastic simulation algorithms and model-checking. The small number of molecules reported for this system tests the limits of the continuous approximation underlying differential equations. We investigate the difference between continuous-deterministic and discrete-stochastic approaches. Stochastic simulation and model-checking allow us to formulate new hypotheses on the system behaviour, such as the presence of self-sustained oscillations in single cells under constant light conditions. We investigate how to model the timing of dawn and dusk in the context of model-checking, which we use to compute how the probability distributions of key biochemical species change over time. These show that the relative variation in expression level is smallest at the time of peak expression, making peak time an optimal experimental phase marker. Building on these analyses, we use approaches from evolutionary systems biology to investigate how changes in the rate of mRNA degradation impacts the phase of a key protein likely to affect fitness. We explore how robust this circadian clock is towards such potential mutational changes in its underlying biochemistry. Our work shows that multiple approaches lead to a more complete understanding of the clock.
This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network (DNN) to take noisy speech as input and output clean speech. Although this supervised approach requires a very large amount of pair data for training, it is not robust against unknown environments. Another approach is to use non-negative matrix factorization (NMF) based on basis spectra trained on clean speech in advance and those adapted to noise on the fly. This semi-supervised approach, however, causes considerable signal distortion in enhanced speech due to the unrealistic assumption that speech spectrograms are linear combinations of the basis spectra. Replacing the poor linear generative model of clean speech in NMF with a VAE---a powerful nonlinear deep generative model---trained on clean speech, we formulate a unified probabilistic generative model of noisy speech. Given noisy speech as observed data, we can sample clean speech from its posterior distribution. The proposed method outperformed the conventional DNN-based method in unseen noisy environments.
We propose a novel model for a topic-aware chatbot by combining the traditional Recurrent Neural Network (RNN) encoder-decoder model with a topic attention layer based on Nonnegative Matrix Factorization (NMF). After learning topic vectors from an auxiliary text corpus via NMF, the decoder is trained so that it is more likely to sample response words from the most correlated topic vectors. One of the main advantages in our architecture is that the user can easily switch the NMF-learned topic vectors so that the chatbot obtains desired topic-awareness. We demonstrate our model by training on a single conversational data set which is then augmented with topic matrices learned from different auxiliary data sets. We show that our topic-aware chatbot not only outperforms the non-topic counterpart, but also that each topic-aware model qualitatively and contextually gives the most relevant answer depending on the topic of question.