No Arabic abstract
I would like to thank Junk and Lyons (arXiv:2009.06864) for beginning a discussion about replication in high-energy physics (HEP). Junk and Lyons ultimately argue that HEP learned its lessons the hard way through past failures and that other fields could learn from our procedures. They emphasize that experimental collaborations would risk their legacies were they to make a type-1 error in a search for new physics and outline the vigilance taken to avoid one, such as data blinding and a strict $5sigma$ threshold. The discussion, however, ignores an elephant in the room: there are regularly anomalies in searches for new physics that result in substantial scientific activity but dont replicate with more data.
Recently, much attention has been focused on the replicability of scientific results, causing scientists, statisticians, and journal editors to examine closely their methodologies and publishing criteria. Experimental particle physicists have been aware of the precursors of non-replicable research for many decades and have many safeguards to ensure that the published results are as reliable as possible. The experiments require large investments of time and effort to design, construct, and operate. Large collaborations produce and check the results, and many papers are signed by more than three thousand authors. This paper gives an introduction to what experimental particle physics is and to some of the tools that are used to analyze the data. It describes the procedures used to ensure that results can be computationally reproduced, both by collaborators and by non-collaborators. It describes the status of publicly available data sets and analysis tools that aid in reproduction and recasting of experimental results. It also describes methods particle physicists use to maximize the reliability of the results, which increases the probability that they can be replicated by other collaborations or even the same collaborations with more data and new personnel. Examples of results that were later found to be false are given, both with failed replication attempts and one with alarmingly successful replications. While some of the characteristics of particle physics experiments are unique, many of the procedures and techniques can be and are used in other fields.
The many ways in which machine and deep learning are transforming the analysis and simulation of data in particle physics are reviewed. The main methods based on boosted decision trees and various types of neural networks are introduced, and cutting-edge applications in the experimental and theoretical/phenomenological domains are highlighted. After describing the challenges in the application of these novel analysis techniques, the review concludes by discussing the interactions between physics and machine learning as a two-way street enriching both disciplines and helping to meet the present and future challenges of data-intensive science at the energy and intensity frontiers.
We present a procedure for reconstructing particle cascades from event data measured in a high energy physics experiment. For evaluating the hypothesis of a specific physics process causing the observed data, all possible reconstructi
New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.
texttt{GooStats} is a software framework that provides a flexible environment and common tools to implement multi-variate statistical analysis. The framework is built upon the texttt{CERN ROOT}, texttt{MINUIT} and texttt{GooFit} packages. Running a multi-variate analysis in parallel on graphics processing units yields a huge boost in performance and opens new possibilities. The design and benchmark of texttt{GooStats} are presented in this article along with illustration of its application to statistical problems.