No Arabic abstract
We reframe common tasks in jet physics in probabilistic terms, including jet reconstruction, Monte Carlo tuning, matrix element - parton shower matching for large jet multiplicity, and efficient event generation of jets in complex, signal-like regions of phase space. We also introduce Ginkgo, a simplified, generative model for jets, that facilitates research into these tasks with techniques from statistics, machine learning, and combinatorial optimization. We review some of the recent research in this direction that has been enabled with Ginkgo. We show how probabilistic programming can be used to efficiently sample the showering process, how a novel trellis algorithm can be used to efficiently marginalize over the enormous number of clustering histories for the same observed particles, and how dynamic programming, A* search, and reinforcement learning can be used to find the maximum likelihood clustering in this enormous search space. This work builds bridges with work in hierarchical clustering, statistics, combinatorial optmization, and reinforcement learning.
We propose a new scientific application of unsupervised learning techniques to boost our ability to search for new phenomena in data, by detecting discrepancies between two datasets. These could be, for example, a simulated standard-model background, and an observed dataset containing a potential hidden signal of New Physics. We build a statistical test upon a test statistic which measures deviations between two samples, using a Nearest Neighbors approach to estimate the local ratio of the density of points. The test is model-independent and non-parametric, requiring no knowledge of the shape of the underlying distributions, and it does not bin the data, thus retaining full information from the multidimensional feature space. As a proof-of-concept, we apply our method to synthetic Gaussian data, and to a simulated dark matter signal at the Large Hadron Collider. Even in the case where the background can not be simulated accurately enough to claim discovery, the technique is a powerful tool to identify regions of interest for further study.
Our predictions for particle physics processes are realized in a chain of complex simulators. They allow us to generate high-fidelity simulated data, but they are not well-suited for inference on the theory parameters with observed data. We explain why the likelihood function of high-dimensional LHC data cannot be explicitly evaluated, why this matters for data analysis, and reframe what the field has traditionally done to circumvent this problem. We then review new simulation-based inference methods that let us directly analyze high-dimensional data by combining machine learning techniques and information from the simulator. Initial studies indicate that these techniques have the potential to substantially improve the precision of LHC measurements. Finally, we discuss probabilistic programming, an emerging paradigm that lets us extend inference to the latent process of the simulator.
The determination of the fundamental parameters of the Standard Model (and its extensions) is often limited by the presence of statistical and theoretical uncertainties. We present several models for the latter uncertainties (random, nuisance, external) in the frequentist framework, and we derive the corresponding $p$-values. In the case of the nuisance approach where theoretical uncertainties are modeled as biases, we highlight the important, but arbitrary, issue of the range of variation chosen for the bias parameters. We introduce the concept of adaptive $p$-value, which is obtained by adjusting the range of variation for the bias according to the significance considered, and which allows us to tackle metrology and exclusion tests with a single and well-defined unified tool, which exhibits interesting frequentist properties. We discuss how the determination of fundamental parameters is impacted by the model chosen for theoretical uncertainties, illustrating several issues with examples from quark flavour physics.
We investigate a potential of measuring properties of a heavy resonance X, exploiting jet substructure techniques. Motivated by heavy higgs boson searches, we focus on the decays of X into a pair of (massive) electroweak gauge bosons. More specifically, we consider a hadronic Z boson, which makes it possible to determine properties of X at an earlier stage. For $m_X$ of O(1) TeV, two quarks from a Z boson would be captured as a merged jet in a significant fraction of events. The use of the merged jet enables us to consider a Z-induced jet as a reconstructed object without any combinatorial ambiguity. We apply a conventional jet substructure method to extract four-momenta of subjets from a merged jet. We find that jet substructure procedures may enhance features in some kinematic observables formed with subjets. Subjet momenta are fed into the matrix element associated with a given hypothesis on the nature of X, which is further processed to construct a matrix element method (MEM)-based observable. For both moderately and highly boosted Z bosons, we demonstrate that the MEM with current jet substructure techniques can be a very powerful discriminator in identifying the physics nature of X. We also discuss effects from choosing different jet sizes for merged jets and jet-grooming parameters upon the MEM analyses.
We suggest that the exclusive Higgs + light (or b)-jet production at the LHC, $pp to h+j(j_b)$, is a rather sensitive probe of the light-quarks Yukawa couplings and of other forms of new physics (NP) in the Higgs-gluon $hgg$ and quark-gluon $qqg$ interactions. We study the Higgs $p_T$-distribution in $pp to h+j(j_b) to gamma gamma + j(j_b)$, i.e., in $h+j(j_b)$ production followed by the Higgs decay $h to gamma gamma$, employing the ($p_T$-dependent) signal strength formalism to probe various types of NP which are relevant to these processes and which we parameterize either as scaled Standard Model (SM) couplings (the kappa-framework) and/or through new higher dimensional effective operators (the SMEFT framework). We find that the exclusive $h+j(j_b)$ production at the 13 TeV LHC is sensitive to various NP scenarios, with typical scales ranging from a few TeV to ${cal O}(10)$ TeV, depending on the flavor, chirality and Lorentz structure of the underlying physics.