No Arabic abstract
Motivation: In this paper we present the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding support for big data, making it possible to efficiently run large genomic data mining analyses. Additional enhancements include integration with R and Bioconductor and an option to remove influence of missing value on the final result. Results: EBIC was applied to datasets of different sizes, including a large DNA methylation dataset with 436,444 rows. For the largest dataset we observed over 6.6 fold speedup in computation time on a cluster of 8 GPUs compared to running the method on a single GPU. This proves high scalability of the algorithm. Availability: The latest version of EBIC could be downloaded from http://github.com/EpistasisLab/ebic . Installation and usage instructions are also available online.
In this paper a novel biclustering algorithm based on artificial intelligence (AI) is introduced. The method called EBIC aims to detect biologically meaningful, order-preserving patterns in complex data. The proposed algorithm is probably the first one capable of discovering with accuracy exceeding 50% multiple complex patterns in real gene expression datasets. It is also one of the very few biclustering methods designed for parallel environments with multiple graphics processing units (GPUs). We demonstrate that EBIC outperforms state-of-the-art biclustering methods, in terms of recovery and relevance, on both synthetic and genetic datasets. EBIC also yields results over 12 times faster than the most accurate reference algorithms. The proposed algorithm is anticipated to be added to the repertoire of unsupervised machine learning algorithms for the analysis of datasets, including those from large-scale genomic studies.
Big imaging data is becoming more prominent in brain sciences across spatiotemporal scales and phylogenies. We have developed a computational ecosystem that enables storage, visualization, and analysis of these data in the cloud, thusfar spanning 20+ publications and 100+ terabytes including nanoscale ultrastructure, microscale synaptogenetic diversity, and mesoscale whole brain connectivity, making NeuroData the largest and most diverse open repository of brain data.
We present Clinica (www.clinica.run), an open-source software platform designed to make clinical neuroscience studies easier and more reproducible. Clinica aims for researchers to i) spend less time on data management and processing, ii) perform reproducible evaluations of their methods, and iii) easily share data and results within their institution and with external collaborators. The core of Clinica is a set of automatic pipelines for processing and analysis of multimodal neuroimaging data (currently, T1-weighted MRI, diffusion MRI and PET data), as well as tools for statistics, machine learning and deep learning. It relies on the brain imaging data structure (BIDS) for the organization of raw neuroimaging datasets and on established tools written by the community to build its pipelines. It also provides converters of public neuroimaging datasets to BIDS (currently ADNI, AIBL, OASIS and NIFD). Processed data include image-valued scalar fields (e.g. tissue probability maps), meshes, surface-based scalar fields (e.g. cortical thickness maps) or scalar outputs (e.g. regional averages). These data follow the ClinicA Processed Structure (CAPS) format which shares the same philosophy as BIDS. Consistent organization of raw and processed neuroimaging files facilitates the execution of single pipelines and of sequences of pipelines, as well as the integration of processed data into statistics or machine learning frameworks. The target audience of Clinica is neuroscientists or clinicians conducting clinical neuroscience studies involving multimodal imaging, and researchers developing advanced machine learning algorithms applied to neuroimaging data.
Due to the instruments non-trivial resolution function, measurements on triple-axis spectrometers require extra care from the experimenter in order to obtain optimal results and to avoid unwanted spurious artefacts. We present a free and open-source software system that aims to ease many of the tasks encountered during the planning phase, in the execution and in data treatment of experiments performed on neutron triple-axis spectrometers. The software is currently in use and has been successfully tested at the MLZ, but can be configured to work with other triple-axis instruments and instrument control systems.
Cryo-electron tomography (cryo-ET) is an emerging technology for the 3D visualization of structural organizations and interactions of subcellular components at near-native state and sub-molecular resolution. Tomograms captured by cryo-ET contain heterogeneous structures representing the complex and dynamic subcellular environment. Since the structures are not purified or fluorescently labeled, the spatial organization and interaction between both the known and unknown structures can be studied in their native environment. The rapid advances of cryo-electron tomography (cryo-ET) have generated abundant 3D cellular imaging data. However, the systematic localization, identification, segmentation, and structural recovery of the subcellular components require efficient and accurate large-scale image analysis methods. We introduce AITom, an open-source artificial intelligence platform for cryo-ET researchers. AITom provides many public as well as in-house algorithms for performing cryo-ET data analysis through both the traditional template-based or template-free approach and the deep learning approach. AITom also supports remote interactive analysis. Comprehensive tutorials for each analysis module are provided to guide the user through. We welcome researchers and developers to join this collaborative open-source software development project. Availability: https://github.com/xulabs/aitom