No Arabic abstract
We present a simulation-based study using deep convolutional neural networks (DCNNs) to identify neutrino interaction vertices in the MINERvA passive targets region, and illustrate the application of domain adversarial neural networks (DANNs) in this context. DANNs are designed to be trained in one domain (simulated data) but tested in a second domain (physics data) and utilize unlabeled data from the second domain so that during training only features which are unable to discriminate between the domains are promoted. MINERvA is a neutrino-nucleus scattering experiment using the NuMI beamline at Fermilab. $A$-dependent cross sections are an important part of the physics program, and these measurements require vertex finding in complicated events. To illustrate the impact of the DANN we used a modified set of simulation in place of physics data during the training of the DANN and then used the label of the modified simulation during the evaluation of the DANN. We find that deep learning based methods offer significant advantages over our prior track-based reconstruction for the task of vertex finding, and that DANNs are able to improve the performance of deep networks by leveraging available unlabeled data and by mitigating network performance degradation rooted in biases in the physics models used for training.
Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. By analyzing the learning dynamics and loss function landscape, we discover a robust inverse relation between the weight variance and the landscape flatness (inverse of curvature) for all SGD-based learning algorithms. To explain the inverse variance-flatness relation, we develop a random landscape theory, which shows that the SGD noise strength (effective temperature) depends inversely on the landscape flatness. Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape. Finally, we demonstrate how these new theoretical insights lead to more efficient algorithms, e.g., for avoiding catastrophic forgetting.
In liquid argon time projection chambers exposed to neutrino beams and running on or near surface levels, cosmic muons and other cosmic particles are incident on the detectors while a single neutrino-induced event is being recorded. In practice, this means that data from surface liquid argon time projection chambers will be dominated by cosmic particles, both as a source of event triggers and as the majority of the particle count in true neutrino-triggered events. In this work, we demonstrate a novel application of deep learning techniques to remove these background particles by applying semantic segmentation on full detector images from the SBND detector, the near detector in the Fermilab Short-Baseline Neutrino Program. We use this technique to identify, at single image-pixel level, whether recorded activity originated from cosmic particles or neutrino interactions.
Pions constitute nearly $70%$ of final state particles in ultra high energy collisions. They act as a probe to understand the statistical properties of Quantum Chromodynamics (QCD) matter i.e. Quark Gluon Plasma (QGP) created in such relativistic heavy ion collisions (HIC). Apart from this, direct photons are the most versatile tools to study relativistic HIC. They are produced, by various mechanisms, during the entire space-time history of the strongly interacting system. Direct photons provide measure of jet-quenching when compared with other quark or gluon jets. The $pi^{0}$ decay into two photons make the identification of non-correlated gamma coming from another process cumbersome in the Electromagnetic Calorimeter. We investigate the use of deep learning architecture for reconstruction and identification of single as well as multi particles showers produced in calorimeter by particles created in high energy collisions. We utilize the data of electromagnetic shower at calorimeter cell-level to train the network and show improvements for identification and characterization. These networks are fast and computationally inexpensive for particle shower identification and reconstruction for current and future experiments at particle colliders.
Collider bias is a harmful form of sample selection bias that neural networks are ill-equipped to handle. This bias manifests itself when the underlying causal signal is strongly correlated with other confounding signals due to the training data collection procedure. In the situation where the confounding signal is easy-to-learn, deep neural networks will latch onto this and the resulting model will generalise poorly to in-the-wild test scenarios. We argue herein that the cause of failure is a combination of the deep structure of neural networks and the greedy gradient-driven learning process used - one that prefers easy-to-compute signals when available. We show it is possible to mitigate against this by generating bias-decoupled training data using latent adversarial debiasing (LAD), even when the confounding signal is present in 100% of the training data. By training neural networks on these adversarial examples,we can improve their generalisation in collider bias settings. Experiments show state-of-the-art performance of LAD in label-free debiasing with gains of 76.12% on background coloured MNIST, 35.47% on fore-ground coloured MNIST, and 8.27% on corrupted CIFAR-10.
Gaussian process tomography (GPT) is a method used for obtaining real-time tomographic reconstructions of the plasma emissivity profile in a tokamak, given some model for the underlying physical processes involved. GPT can also be used, thanks to Bayesian formalism, to perform model selection -- i.e., comparing different models and choosing the one with maximum evidence. However, the computations involved in this particular step may become slow for data with high dimensionality, especially when comparing the evidence for many different models. Using measurements collected by the ASDEX Upgrade Soft X-ray (SXR) diagnostic, we train a convolutional neural network (CNN) to map SXR tomographic projections to the corresponding GPT model whose evidence is highest. We then compare the networks results, and the time required to calculate them, with those obtained through analytical Bayesian formalism. In addition, we use the networks classifications to produce tomographic reconstructions of the plasma emissivity profile, whose quality we evaluate by comparing their projection into measurement space with the existing measurements themselves.