No Arabic abstract
An important problem in analysis of neural data is to characterize interactions across brain regions from high-dimensional multiple-electrode recordings during a behavioral experiment. Lead-lag effects indicate possible directional flows of neural information, but they are often transient, appearing during short intervals of time. Such non-stationary interactions can be difficult to identify, but they can be found by taking advantage of the replication structure inherent to many neurophysiological experiments. To describe non-stationary interactions between replicated pairs of high-dimensional time series, we developed a method of estimating latent, non-stationary cross-correlation. Our approach begins with an extension of probabilistic CCA to the time series setting, which provides a model-based interpretation of multiset CCA. Because the covariance matrix describing non-stationary dependence is high-dimensional, we assume sparsity of cross-correlations within a range of possible interesting lead-lag effects. We show that the method can perform well in realistic settings and we apply it to 192 simultaneous local field potential (LFP) recordings from prefrontal cortex (PFC) and visual cortex (area V4) during a visual memory task. We find lead-lag relationships that are highly plausible, being consistent with related results in the literature.
Advances in neural recording present increasing opportunities to study neural activity in unprecedented detail. Latent variable models (LVMs) are promising tools for analyzing this rich activity across diverse neural systems and behaviors, as LVMs do not depend on known relationships between the activity and external experimental variables. However, progress in latent variable modeling is currently impeded by a lack of standardization, resulting in methods being developed and compared in an ad hoc manner. To coordinate these modeling efforts, we introduce a benchmark suite for latent variable modeling of neural population activity. We curate four datasets of neural spiking activity from cognitive, sensory, and motor areas to promote models that apply to the wide variety of activity seen across these areas. We identify unsupervised evaluation as a common framework for evaluating models across datasets, and apply several baselines that demonstrate benchmark diversity. We release this benchmark through EvalAI. http://neurallatents.github.io
Stationary and ergodic time series can be constructed using an s-vine decomposition based on sets of bivariate copula functions. The extension of such processes to infinite copula sequences is considered and shown to yield a rich class of models that generalizes Gaussian ARMA and ARFIMA processes to allow both non-Gaussian marginal behaviour and a non-Gaussian description of the serial partial dependence structure. Extensions of classical causal and invertible representations of linear processes to general s-vine processes are proposed and investigated. A practical and parsimonious method for parameterizing s-vine processes using the Kendall partial autocorrelation function is developed. The potential of the resulting models to give improved statistical fits in many applications is indicated with an example using macroeconomic data.
Adaptive collection of data is commonplace in applications throughout science and engineering. From the point of view of statistical inference however, adaptive data collection induces memory and correlation in the samples, and poses significant challenge. We consider the high-dimensional linear regression, where the samples are collected adaptively, and the sample size $n$ can be smaller than $p$, the number of covariates. In this setting, there are two distinct sources of bias: the first due to regularization imposed for consistent estimation, e.g. using the LASSO, and the second due to adaptivity in collecting the samples. We propose online debiasing, a general procedure for estimators such as the LASSO, which addresses both sources of bias. In two concrete contexts $(i)$ time series analysis and $(ii)$ batched data collection, we demonstrate that online debiasing optimally debiases the LASSO estimate when the underlying parameter $theta_0$ has sparsity of order $o(sqrt{n}/log p)$. In this regime, the debiased estimator can be used to compute $p$-values and confidence intervals of optimal size.
We introduce a class of semiparametric time series models by assuming a quasi-likelihood approach driven by a latent factor process. More specifically, given the latent process, we only specify the conditional mean and variance of the time series and enjoy a quasi-likelihood function for estimating parameters related to the mean. This proposed methodology has three remarkable features: (i) no parametric form is assumed for the conditional distribution of the time series given the latent process; (ii) able for modelling non-negative, count, bounded/binary and real-valued time series; (iii) dispersion parameter is not assumed to be known. Further, we obtain explicit expressions for the marginal moments and for the autocorrelation function of the time series process so that a method of moments can be employed for estimating the dispersion parameter and also parameters related to the latent process. Simulated results aiming to check the proposed estimation procedure are presented. Real data analysis on unemployment rate and precipitation time series illustrate the potencial for practice of our methodology.
This paper deals with the dimension reduction for high-dimensional time series based on common factors. In particular we allow the dimension of time series $p$ to be as large as, or even larger than, the sample size $n$. The estimation for the factor loading matrix and the factor process itself is carried out via an eigenanalysis for a $ptimes p$ non-negative definite matrix. We show that when all the factors are strong in the sense that the norm of each column in the factor loading matrix is of the order $p^{1/2}$, the estimator for the factor loading matrix, as well as the resulting estimator for the precision matrix of the original $p$-variant time series, are weakly consistent in $L_2$-norm with the convergence rates independent of $p$. This result exhibits clearly that the `curse is canceled out by the `blessings in dimensionality. We also establish the asymptotic properties of the estimation when not all factors are strong. For the latter case, a two-step estimation procedure is preferred accordingly to the asymptotic theory. The proposed methods together with their asymptotic properties are further illustrated in a simulation study. An application to a real data set is also reported.