No Arabic abstract
There have been many attempts to identify high-dimensional network features via multivariate approaches. Specifically, when the number of voxels or nodes, denoted as p, are substantially larger than the number of images, denoted as n, it produces an under-determined model with infinitely many possible solutions. The small-n large-p problem is often remedied by regularizing the under-determined system with additional sparse penalties. Popular sparse network models include sparse correlations, LASSO, sparse canonical correlations and graphical-LASSO. These popular sparse models require optimizing L1-norm penalties, which has been the major computational bottleneck for solving large-scale problems. Thus, many existing sparse brain network models in brain imaging have been restricted to a few hundreds nodes or less. 2527 MRI features used in a LASSO model for Alzheimers disease is probably the largest number of features used in any sparse model in the brain imaging literature.
We propose a novel technique to assess functional brain connectivity in EEG/MEG signals. Our method, called Sparsely-Connected Sources Analysis (SCSA), can overcome the problem of volume conduction by modeling neural data innovatively with the following ingredients: (a) the EEG is assumed to be a linear mixture of correlated sources following a multivariate autoregressive (MVAR) model, (b) the demixing is estimated jointly with the source MVAR parameters, (c) overfitting is avoided by using the Group Lasso penalty. This approach allows to extract the appropriate level cross-talk between the extracted sources and in this manner we obtain a sparse data-driven model of functional connectivity. We demonstrate the usefulness of SCSA with simulated data, and compare to a number of existing algorithms with excellent results.
Sensitivity analysis (SA) is an important aspect of process automation. It often aims to identify the process inputs that influence the process outputs variance significantly. Existing SA approaches typically consider the input-output relationship as a black-box and conduct extensive random sampling from the actual process or its high-fidelity simulation model to identify the influential inputs. In this paper, an alternate, novel approach is proposed using a sparse polynomial chaos expansion-based model for a class of input-output relationships represented as directed acyclic networks. The model exploits the relationship structure by recursively relating a network node to its direct predecessors to trace the output variance back to the inputs. It, thereby, estimates the Sobol indices, which measure the influence of each input on the output variance, accurately and efficiently. Theoretical analysis establishes the validity of the model as the prediction of the network output converges in probability to the true output under certain regularity conditions. Empirical evaluation on two manufacturing processes shows that the model estimates the Sobol indices accurately with far fewer observations than a state-of-the-art Monte Carlo sampling method.
Traffic flow count data in networks arise in many applications, such as automobile or aviation transportation, certain directed social network contexts, and Internet studies. Using an example of Internet browser traffic flow through site-segments of an international news website, we present Bayesian analyses of two linked classes of models which, in tandem, allow fast, scalable and interpretable Bayesian inference. We first develop flexible state-space models for streaming count data, able to adaptively characterize and quantify network dynamics efficiently in real-time. We then use these models as emulators of more structured, time-varying gravity models that allow formal dissection of network dynamics. This yields interpretable inferences on traffic flow characteristics, and on dynamics in interactions among network nodes. Bayesian monitoring theory defines a strategy for sequential model assessment and adaptation in cases when network flow data deviates from model-based predictions. Exploratory and sequential monitoring analyses of evolving traffic on a network of web site-segments in e-commerce demonstrate the utility of this coupled Bayesian emulation approach to analysis of streaming network count data.
We consider the offline change point detection and localization problem in the context of piecewise stationary networks, where the observable is a finite sequence of networks. We develop algorithms involving some suitably modified CUSUM statistics based on adaptively trimmed adjacency matrices of the observed networks for both detection and localization of single or multiple change points present in the input data. We provide rigorous theoretical analysis and finite sample estimates evaluating the performance of the proposed methods when the input (finite sequence of networks) is generated from an inhomogeneous random graph model, where the change points are characterized by the change in the mean adjacency matrix. We show that the proposed algorithms can detect (resp. localize) all change points, where the change in the expected adjacency matrix is above the minimax detectability (resp. localizability) threshold, consistently without any a priori assumption about (a) a lower bound for the sparsity of the underlying networks, (b) an upper bound for the number of change points, and (c) a lower bound for the separation between successive change points, provided either the minimum separation between successive pairs of change points or the average degree of the underlying networks goes to infinity arbitrarily slowly. We also prove that the above condition is necessary to have consistency.
Its conceptual appeal and effectiveness has made latent factor modeling an indispensable tool for multivariate analysis. Despite its popularity across many fields, there are outstanding methodological challenges that have hampered practical deployments. One major challenge is the selection of the number of factors, which is exacerbated for dynamic factor models, where factors can disappear, emerge, and/or reoccur over time. Existing tools that assume a fixed number of factors may provide a misguided representation of the data mechanism, especially when the number of factors is crudely misspecified. Another challenge is the interpretability of the factor structure, which is often regarded as an unattainable objective due to the lack of identifiability. Motivated by a topical macroeconomic application, we develop a flexible Bayesian method for dynamic factor analysis (DFA) that can simultaneously accommodate a time-varying number of factors and enhance interpretability without strict identifiability constraints. To this end, we turn to dynamic sparsity by employing Dynamic Spike-and-Slab (DSS) priors within DFA. Scalable Bayesian EM estimation is proposed for fast posterior mode identification via rotations to sparsity, enabling Bayesian data analysis at scales that would have been previously time-consuming. We study a large-scale balanced panel of macroeconomic variables covering multiple facets of the US economy, with a focus on the Great Recession, to highlight the efficacy and usefulness of our proposed method.