Do you want to publish a course? Click here

A Community-Developed Open-Source Computational Ecosystem for Big Neuro Data

321   0   0.0 ( 0 )
 Added by Joshua Vogelstein
 Publication date 2018
  fields Biology
and research's language is English




Ask ChatGPT about the research

Big imaging data is becoming more prominent in brain sciences across spatiotemporal scales and phylogenies. We have developed a computational ecosystem that enables storage, visualization, and analysis of these data in the cloud, thusfar spanning 20+ publications and 100+ terabytes including nanoscale ultrastructure, microscale synaptogenetic diversity, and mesoscale whole brain connectivity, making NeuroData the largest and most diverse open repository of brain data.



rate research

Read More

1. Joint Species Distribution models (JSDMs) explain spatial variation in community composition by contributions of the environment, biotic associations, and possibly spatially structured residual covariance. They show great promise as a general analytical framework for community ecology and macroecology, but current JSDMs, even when approximated by latent variables, scale poorly on large datasets, limiting their usefulness for currently emerging big (e.g., metabarcoding and metagenomics) community datasets. 2. Here, we present a novel, more scalable JSDM (sjSDM) that circumvents the need to use latent variables by using a Monte-Carlo integration of the joint JSDM likelihood and allows flexible elastic net regularization on all model components. We implemented sjSDM in PyTorch, a modern machine learning framework that can make use of CPU and GPU calculations. Using simulated communities with known species-species associations and different number of species and sites, we compare sjSDM with state-of-the-art JSDM implementations to determine computational runtimes and accuracy of the inferred species-species and species-environmental associations. 3. We find that sjSDM is orders of magnitude faster than existing JSDM algorithms (even when run on the CPU) and can be scaled to very large datasets. Despite the dramatically improved speed, sjSDM produces more accurate estimates of species association structures than alternative JSDM implementations. We demonstrate the applicability of sjSDM to big community data using eDNA case study with thousands of fungi operational taxonomic units (OTU). 4. Our sjSDM approach makes the analysis of JSDMs to large community datasets with hundreds or thousands of species possible, substantially extending the applicability of JSDMs in ecology. We provide our method in an R package to facilitate its applicability for practical data analysis.
Normal mode analysis offers an efficient way of modeling the conformational flexibility of protein structures. Simple models defined by contact topology, known as elastic network models, have been used to model a variety of systems, but the validation is typically limited to individual modes for a single protein. We use anisotropic displacement parameters from crystallography to test the quality of prediction of both the magnitude and directionality of conformational variance. Normal modes from four simple elastic network model potentials and from the CHARMM forcefield are calculated for a data set of 83 diverse, ultrahigh resolution crystal structures. While all five potentials provide good predictions of the magnitude of flexibility, the methods that consider all atoms have a clear edge at prediction of directionality, and the CHARMM potential produces the best agreement. The low-frequency modes from different potentials are similar, but those computed from the CHARMM potential show the greatest difference from the elastic network models. This was illustrated by computing the dynamic correlation matrices from different potentials for a PDZ domain structure. Comparison of normal mode results with anisotropic temperature factors opens the possibility of using ultrahigh resolution crystallographic data as a quantitative measure of molecular flexibility. The comprehensive evaluation demonstrates the costs and benefits of using normal mode potentials of varying complexity. Comparison of the dynamic correlation matrices suggests that a combination of topological and chemical potentials may help identify residues in which chemical forces make large contributions to intramolecular coupling.
187 - G. Wu , W.Liao , S. Stramaglia 2012
A great improvement to the insight on brain function that we can get from fMRI data can come from effective connectivity analysis, in which the flow of information between even remote brain regions is inferred by the parameters of a predictive dynamical model. As opposed to biologically inspired models, some techniques as Granger causality (GC) are purely data-driven and rely on statistical prediction and temporal precedence. While powerful and widely applicable, this approach could suffer from two main limitations when applied to BOLD fMRI data: confounding effect of hemodynamic response function (HRF) and conditioning to a large number of variables in presence of short time series. For task-related fMRI, neural population dynamics can be captured by modeling signal dynamics with explicit exogenous inputs; for resting-state fMRI on the other hand, the absence of explicit inputs makes this task more difficult, unless relying on some specific prior physiological hypothesis. In order to overcome these issues and to allow a more general approach, here we present a simple and novel blind-deconvolution technique for BOLD-fMRI signal. Coming to the second limitation, a fully multivariate conditioning with short and noisy data leads to computational problems due to overfitting. Furthermore, conceptual issues arise in presence of redundancy. We thus apply partial conditioning to a limited subset of variables in the framework of information theory, as recently proposed. Mixing these two improvements we compare the differences between BOLD and deconvolved BOLD level effective networks and draw some conclusions.
Motivation: In this paper we present the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding support for big data, making it possible to efficiently run large genomic data mining analyses. Additional enhancements include integration with R and Bioconductor and an option to remove influence of missing value on the final result. Results: EBIC was applied to datasets of different sizes, including a large DNA methylation dataset with 436,444 rows. For the largest dataset we observed over 6.6 fold speedup in computation time on a cluster of 8 GPUs compared to running the method on a single GPU. This proves high scalability of the algorithm. Availability: The latest version of EBIC could be downloaded from http://github.com/EpistasisLab/ebic . Installation and usage instructions are also available online.
The occurrence and distributions of wildlife populations and communities are shifting as a result of global changes. To evaluate whether these shifts are negatively impacting biodiversity processes, it is critical to monitor the status, trends, and effects of environmental variables on entire communities. However, modeling the dynamics of multiple species simultaneously can require large amounts of diverse data, and few modeling approaches exist to simultaneously provide species and community level inferences. We present an integrated community occupancy model (ICOM) that unites principles of data integration and hierarchical community modeling in a single framework to provide inferences on species-specific and community occurrence dynamics using multiple data sources. We use simulations to compare the ICOM to previously developed hierarchical community occupancy models and single species integrated distribution models. We then apply our model to assess the occurrence and biodiversity dynamics of foliage-gleaning birds in the White Mountain National Forest in the northeastern USA from 2010-2018 using three independent data sources. Simulations reveal that integrating multiple data sources in the ICOM increased precision and accuracy of species and community level inferences compared to single data source models, although benefits of integration were dependent on data source quality (e.g., amount of replication). Compared to single species models, the ICOM yielded more precise species-level estimates. Within our case study, the ICOM had the highest out-of-sample predictive performance compared to single species models and models that used only a subset of the three data sources. The ICOM offers an attractive approach to estimate species and biodiversity dynamics, which is additionally valuable to inform management objectives of both individual species and their broader communities.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا