Orchestrating parametric fitting of multicomponent spectra at scale is an essential yet underappreciated task in high-throughput quantification of materials and chemical composition. To automate the annotation process for spectroscopic and diffraction data collected in counts of hundreds to thousands, we present a systematic approach compatible with high-performance computing infrastructures using the MapReduce model and task-based parallelization. We implement the approach in software and demonstrate linear computational scaling with respect to spectral components using multidimensional experimental materials characterization datasets from photoemission spectroscopy and powder electron diffraction as benchmarks. Our approach enables efficient generation of high-quality data annotation and online spectral analysis and is applicable to a variety of analytical techniques in materials science and chemistry as a building block for closed-loop experimental systems.
We propose a novel data-driven approach for analyzing synchrotron Laue X-ray microdiffraction scans based on machine learning algorithms. The basic architecture and major components of the method are formulated mathematically. We demonstrate it through typical examples including polycrystalline BaTiO$_3$, multiphase transforming alloys and finely twinned martensite. The computational pipeline is implemented for beamline 12.3.2 at the Advanced Light Source, Lawrence Berkeley National Lab. The conventional analytical pathway for X-ray diffraction scans is based on a slow pattern by pattern crystal indexing process. This work provides a new way for analyzing X-ray diffraction 2D patterns, independent of the indexing process, and motivates further studies of X-ray diffraction patterns from the machine learning prospective for the development of suitable feature extraction, clustering and labeling algorithms.
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at http://openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems---reads to parallel disk arrays and writes to solid-state storage---to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effectiveness of spatial data organization.
Recent advances in the synthesis of polar molecular materials have produced practical alternatives to ferroelectric ceramics, opening up exciting new avenues for their incorporation into modern electronic devices. However, in order to realize the full potential of polar polymer and molecular crystals for modern technological applications, it is paramount to assemble and evaluate all the available data for such compounds, identifying descriptors that could be associated with an emergence of ferroelectricity. In this work, we utilized data-driven approaches to judiciously shortlist candidate materials from a wide chemical space that could possess ferroelectric functionalities. An importance-sampling based method was utilized to address the challenge of having a limited amount of available data on already known organic ferroelectrics. Sets of molecular- and crystal-level descriptors were combined with a Random Forest Regression algorithm in order to predict spontaneous polarization of the shortlisted compounds with an average error of ~20%.
The standard noise model in gravitational wave (GW) data analysis assumes detector noise is stationary and Gaussian distributed, with a known power spectral density (PSD) that is usually estimated using clean off-source data. Real GW data often depart from these assumptions, and misspecified parametric models of the PSD could result in misleading inferences. We propose a Bayesian semiparametric approach to improve this. We use a nonparametric Bernstein polynomial prior on the PSD, with weights attained via a Dirichlet process distribution, and update this using the Whittle likelihood. Posterior samples are obtained using a blocked Metropolis-within-Gibbs sampler. We simultaneously estimate the reconstruction parameters of a rotating core collapse supernova GW burst that has been embedded in simulated Advanced LIGO noise. We also discuss an approach to deal with non-stationary data by breaking longer data streams into smaller and locally stationary components.
Investment in brighter sources and larger and faster detectors has accelerated the speed of data acquisition at national user facilities. The accelerated data acquisition offers many opportunities for discovery of new materials, but it also presents a daunting challenge. The rate of data acquisition far exceeds the current speed of data quality assessment, resulting in less than optimal data and data coverage, which in extreme cases forces recollection of data. Herein, we show how this challenge can be addressed through development of an approach that makes routine data assessment automatic and instantaneous. Through extracting and visualizing customized attributes in real time, data quality and coverage, as well as other scientifically relevant information contained in large datasets is highlighted. Deployment of such an approach not only improves the quality of data but also helps optimize usage of expensive characterization resources by prioritizing measurements of highest scientific impact. We anticipate our approach to become a starting point for a sophisticated decision-tree that optimizes data quality and maximizes scientific content in real time through automation. With these efforts to integrate more automation in data collection and analysis, we can truly take advantage of the accelerating speed of data acquisition.
Rui Patrick Xian
,Ralph Ernstorfer
,Philipp Michael Pelz
.
(2021)
.
"Scalable multicomponent spectral analysis for high-throughput data annotation"
.
Rui Patrick Xian
هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا