No Arabic abstract
Datasets from single-molecule experiments often reflect a large variety of molecular behaviour. The exploration of such datasets can be challenging, especially if knowledge about the data is limited and a priori assumptions about expected data characteristics are to be avoided. Indeed, searching for pre-defined signal characteristics is sometimes useful, but it can also lead to information loss and the introduction of expectation bias. Here, we demonstrate how Transfer Learning-enhanced dimensionality reduction can be employed to identify and quantify hidden features in single-molecule charge transport data, in an unsupervised manner. Taking advantage of open-access neural networks trained on millions of seemingly unrelated image data, our results also show how Deep Learning methodologies can readily be employed, even if the amount of problem-specific, own data is limited.
We describe a novel application of the end-to-end deep learning technique to the task of discriminating top quark-initiated jets from those originating from the hadronization of a light quark or a gluon. The end-to-end deep learning technique combines deep learning algorithms and low-level detector representation of the high-energy collision event. In this study, we use low-level detector information from the simulated CMS Open Data samples to construct the top jet classifiers. To optimize classifier performance we progressively add low-level information from the CMS tracking detector, including pixel detector reconstructed hits and impact parameters, and demonstrate the value of additional tracking information even when no new spatial structures are added. Relying only on calorimeter energy deposits and reconstructed pixel detector hits, the end-to-end classifier achieves an AUC score of 0.975$pm$0.002 for the task of classifying boosted top quark jets. After adding derived track quantities, the classifier AUC score increases to 0.9824$pm$0.0013, serving as the first performance benchmark for these CMS Open Data samples. We additionally provide a timing performance comparison of different processor unit architectures for training the network.
We use spatially-sparse two, three and four dimensional convolutional autoencoder networks to model sparse structures in 2D space, 3D space, and 3+1=4 dimensional space-time. We evaluate the resulting latent spaces by testing their usefulness for downstream tasks. Applications are to handwriting recognition in 2D, segmentation for parts in 3D objects, segmentation for objects in 3D scenes, and body-part segmentation for 4D wire-frame models generated from motion capture data.
Data-driven prediction and physics-agnostic machine-learning methods have attracted increased interest in recent years achieving forecast horizons going well beyond those to be expected for chaotic dynamical systems. In a separate strand of research data-assimilation has been successfully used to optimally combine forecast models and their inherent uncertainty with incoming noisy observations. The key idea in our work here is to achieve increased forecast capabilities by judiciously combining machine-learning algorithms and data assimilation. We combine the physics-agnostic data-driven approach of random feature maps as a forecast model within an ensemble Kalman filter data assimilation procedure. The machine-learning model is learned sequentially by incorporating incoming noisy observations. We show that the obtained forecast model has remarkably good forecast skill while being computationally cheap once trained. Going beyond the task of forecasting, we show that our method can be used to generate reliable ensembles for probabilistic forecasting as well as to learn effective model closure in multi-scale systems.
The models and weights of prior trained Convolutional Neural Networks (CNN) created to perform automated isotopic classification of time-sequenced gamma-ray spectra, were utilized to provide source domain knowledge as training on new domains of potential interest. The previous results were achieved solely using modeled spectral data. In this work we attempt to transfer the knowledge gained to the new, if similar, domain of solely measured data. The ability to train on modeled data and predict on measured data will be crucial in any successful data-driven approach to this problem space.
Over the last decade, scanning transmission electron microscopy (STEM) has emerged as a powerful tool for probing atomic structures of complex materials with picometer precision, opening the pathway toward exploring ferroelectric, ferroelastic, and chemical phenomena on the atomic-scale. Analyses to date extracting a polarization signal from lattice coupled distortions in STEM imaging rely on discovery of atomic positions from intensity maxima/minima and subsequent calculation of polarization and other order parameter fields from the atomic displacements. Here, we explore the feasibility of polarization mapping directly from the analysis of STEM images using deep convolutional neural networks (DCNNs). In this approach, the DCNN is trained on the labeled part of the image (i.e., for human labelling), and the trained network is subsequently applied to other images. We explore the effects of the choice of the descriptors (centered on atomic columns and grid-based), the effects of observational bias, and whether the network trained on one composition can be applied to a different one. This analysis demonstrates the tremendous potential of the DCNN for the analysis of high-resolution STEM imaging and spectral data and highlights the associated limitations.