Do you want to publish a course? Click here

Inferring network properties from time series using transfer entropy and mutual information: validation of multivariate versus bivariate approaches

87   0   0.0 ( 0 )
 Added by Leonardo Novelli
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Functional and effective networks inferred from time series are at the core of network neuroscience. Interpreting their properties requires inferred network models to reflect key underlying structural features; however, even a few spurious links can distort network measures, challenging functional connectomes. We study the extent to which micro- and macroscopic properties of underlying networks can be inferred by algorithms based on mutual information and bivariate/multivariate transfer entropy. The validation is performed on two macaque connectomes and on synthetic networks with various topologies (regular lattice, small-world, random, scale-free, modular). Simulations are based on a neural mass model and on autoregressive dynamics (employing Gaussian estimators for direct comparison to functional connectivity and Granger causality). We find that multivariate transfer entropy captures key properties of all networks for longer time series. Bivariate methods can achieve higher recall (sensitivity) for shorter time series but are unable to control false positives (lower specificity) as available data increases. This leads to overestimated clustering, small-world, and rich-club coefficients, underestimated shortest path lengths and hub centrality, and fattened degree distribution tails. Caution should therefore be used when interpreting network properties of functional connectomes obtained via correlation or pairwise statistical dependence measures, rather than more holistic (yet data-hungry) multivariate models.



rate research

Read More

Network inference algorithms are valuable tools for the study of large-scale neuroimaging datasets. Multivariate transfer entropy is well suited for this task, being a model-free measure that captures nonlinear and lagged dependencies between time series to infer a minimal directed network model. Greedy algorithms have been proposed to efficiently deal with high-dimensional datasets while avoiding redundant inferences and capturing synergistic effects. However, multiple statistical comparisons may inflate the false positive rate and are computationally demanding, which limited the size of previous validation studies. The algorithm we present---as implemented in the IDTxl open-source software---addresses these challenges by employing hierarchical statistical tests to control the family-wise error rate and to allow for efficient parallelisation. The method was validated on synthetic datasets involving random networks of increasing size (up to 100 nodes), for both linear and nonlinear dynamics. The performance increased with the length of the time series, reaching consistently high precision, recall, and specificity (>98% on average) for 10000 time samples. Varying the statistical significance threshold showed a more favourable precision-recall trade-off for longer time series. Both the network size and the sample size are one order of magnitude larger than previously demonstrated, showing feasibility for typical EEG and MEG experiments.
We propose a new estimator to measure directed dependencies in time series. The dimensionality of data is first reduced using a new non-uniform embedding technique, where the variables are ranked according to a weighted sum of the amount of new information and improvement of the prediction accuracy provided by the variables. Then, using a greedy approach, the most informative subsets are selected in an iterative way. The algorithm terminates, when the highest ranked variable is not able to significantly improve the accuracy of the prediction as compared to that obtained using the existing selected subsets. In a simulation study, we compare our estimator to existing state-of-the-art methods at different data lengths and directed dependencies strengths. It is demonstrated that the proposed estimator has a significantly higher accuracy than that of existing methods, especially for the difficult case, where the data is highly correlated and coupled. Moreover, we show its false detection of directed dependencies due to instantaneous couplings effect is lower than that of existing measures. We also show applicability of the proposed estimator on real intracranial electroencephalography data.
Functional protein-protein interactions are crucial in most cellular processes. They enable multi-protein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are functional interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. Our mutual information-based method also provides signatures of the existence of interactions between protein families. These results stand in contrast with structure prediction of proteins and of multi-protein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.
Natural and social multivariate systems are commonly studied through sets of simultaneous and time-spaced measurements of the observables that drive their dynamics, i.e., through sets of time series. Typically, this is done via hypothesis testing: the statistical properties of the empirical time series are tested against those expected under a suitable null hypothesis. This is a very challenging task in complex interacting systems, where statistical stability is often poor due to lack of stationarity and ergodicity. Here, we describe an unsupervised, data-driven framework to perform hypothesis testing in such situations. This consists of a statistical mechanical approach - analogous to the configuration model for networked systems - for ensembles of time series designed to preserve, on average, some of the statistical properties observed on an empirical set of time series. We showcase its possible applications with a case study on financial portfolio selection.
We study the optimality conditions of information transfer in systems with memory in the low signal-to-noise ratio regime of vanishing input amplitude. We find that the optimal mutual information is represented by a maximum-variance of the signal time course, with correlation structure determined by the Fisher information matrix. We provide illustration of the method on a simple biologically-inspired model of electro-sensory neuron. Our general results apply also to the study of information transfer in single neurons subject to weak stimulation, with implications to the problem of coding efficiency in biological systems.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا