No Arabic abstract
Modern technologies are enabling scientists to collect extraordinary amounts of complex and sophisticated data across a huge range of scales like never before. With this onslaught of data, we can allow the focal point to shift towards answering the question of how we can analyze and understand the massive amounts of data in front of us. Unfortunately, lack of standardized sharing mechanisms and practices often make reproducing or extending scientific results very difficult. With the creation of data organization structures and tools which drastically improve code portability, we now have the opportunity to design such a framework for communicating extensible scientific discoveries. Our proposed solution leverages these existing technologies and standards, and provides an accessible and extensible model for reproducible research, called science in the cloud (sic). Exploiting scientific containers, cloud computing and cloud data services, we show the capability to launch a computer in the cloud and run a web service which enables intimate interaction with the tools and data presented. We hope this model will inspire the community to produce reproducible and, importantly, extensible results which will enable us to collectively accelerate the rate at which scientific breakthroughs are discovered, replicated, and extended.
The field of connectomics faces unprecedented big data challenges. To reconstruct neuronal connectivity, automated pixel-level segmentation is required for petabytes of streaming electron microscopy data. Existing algorithms provide relatively good accuracy but are unacceptably slow, and would require years to extract connectivity graphs from even a single cubic millimeter of neural tissue. Here we present a viable real-time solution, a multi-pass pipeline optimized for shared-memory multicore systems, capable of processing data at near the terabyte-per-hour pace of multi-beam electron microscopes. The pipeline makes an initial fast-pass over the data, and then makes a second slow-pass to iteratively correct errors in the output of the fast-pass. We demonstrate the accuracy of a sparse slow-pass reconstruction algorithm and suggest new methods for detecting morphological errors. Our fast-pass approach provided many algorithmic challenges, including the design and implementation of novel shallow convolutional neural nets and the parallelization of watershed and object-merging techniques. We use it to reconstruct, from image stack to skeletons, the full dataset of Kasthuri et al. (463 GB capturing 120,000 cubic microns) in a matter of hours on a single multicore machine rather than the weeks it has taken in the past on much larger distributed systems.
An unsolved challenge in the development of antigen specific immunotherapies is determining the optimal antigens to target. Comprehension of antigen-MHC binding is paramount towards achieving this goal. Here, we present CASTELO, a combined machine learning-molecular dynamics (ML-MD) approach to design novel antigens of increased MHC binding affinity for a Type 1 diabetes (T1D)-implicated system. We build upon a small molecule lead optimization algorithm by training a convolutional variational autoencoder (CVAE) on MD trajectories of 48 different systems across 4 antigens and 4 HLA serotypes. We develop several new machine learning metrics including a structure-based anchor residue classification model as well as cluster comparison scores. ML-MD predictions agree well with experimental binding results and free energy perturbation-predicted binding affinities. Moreover, ML-MD metrics are independent of traditional MD stability metrics such as contact area and RMSF, which do not reflect binding affinity data. Our work supports the role of structure-based deep learning techniques in antigen specific immunotherapy design.
Reconstructing a map of neuronal connectivity is a critical challenge in contemporary neuroscience. Recent advances in high-throughput serial section electron microscopy (EM) have produced massive 3D image volumes of nanoscale brain tissue for the first time. The resolution of EM allows for individual neurons and their synaptic connections to be directly observed. Recovering neuronal networks by manually tracing each neuronal process at this scale is unmanageable, and therefore researchers are developing automated image processing modules. Thus far, state-of-the-art algorithms focus only on the solution to a particular task (e.g., neuron segmentation or synapse identification). In this manuscript we present the first fully automated images-to-graphs pipeline (i.e., a pipeline that begins with an imaged volume of neural tissue and produces a brain graph without any human interaction). To evaluate overall performance and select the best parameters and methods, we also develop a metric to assess the quality of the output graphs. We evaluate a set of algorithms and parameters, searching possible operating points to identify the best available brain graph for our assessment metric. Finally, we deploy a reference end-to-end version of the pipeline on a large, publicly available data set. This provides a baseline result and framework for community analysis and future algorithm development and testing. All code and data derivatives have been made publicly available toward eventually unlocking new biofidelic computational primitives and understanding of neuropathologies.
Freely and openly shared low-cost electronic applications, known as open electronics, have sparked a new open-source movement, with much un-tapped potential to advance scientific research. Initially designed to appeal to electronic hobbyists, open electronics have formed a global community of makers and inventors and are increasingly used in science and industry. Here, we review the current benefits of open electronics for scientific research and guide academics to enter this emerging field. We discuss how electronic applications, from the experimental to the theoretical sciences, can help (I) individual researchers by increasing the customization, efficiency, and scalability of experiments, while improving data quantity and quality; (II) scientific institutions by improving access and maintenance of high-end technologies, visibility and interdisciplinary collaboration potential; and (III) the scientific community by improving transparency and reproducibility, helping decouple research capacity from funding, increasing innovation, and improving collaboration potential among researchers and the public. Open electronics are powerful tools to increase creativity, democratization, and reproducibility of research and thus offer practical solutions to overcome significant barriers in science.
Neuroscientists are now able to acquire data at staggering rates across spatiotemporal scales. However, our ability to capitalize on existing datasets, tools, and intellectual capacities is hampered by technical challenges. The key barriers to accelerating scientific discovery correspond to the FAIR data principles: findability, global access to data, software interoperability, and reproducibility/re-usability. We conducted a hackathon dedicated to making strides in those steps. This manuscript is a technical report summarizing these achievements, and we hope serves as an example of the effectiveness of focused, deliberate hackathons towards the advancement of our quickly-evolving field.