No Arabic abstract
We present a case study describing efforts to optimise and modernise Modal, the simulation and analysis pipeline used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum (or three-point correlator) of the cosmic microwave background radiation. We focus on one particular element of the code: the projection of bispectra from the end of inflation to the spherical shell at decoupling, which defines the CMB we observe today. This code involves a three-dimensional inner product between two functions, one of which requires an integral, on a non-rectangular domain containing a sparse grid. We show that by employing separable methods this calculation can be reduced to a one-dimensional summation plus two integrations, reducing the overall dimensionality from four to three. The introduction of separable functions also solves the issue of the non-rectangular sparse grid. This separable method can become unstable in certain cases and so the slower non-separable integral must be calculated instead. We present a discussion of the optimisation of both approaches. We show significant speed-ups of ~100x, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3x and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38x. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.
We present an efficient separable approach to the estimation and reconstruction of the bispectrum and the trispectrum from observational (or simulated) large scale structure data. This is developed from general CMB (poly-)spectra methods which exploit the fact that the bispectrum and trispectrum in the literature can be represented by a separable mode expansion which converges rapidly (with $n_textrm{max}={cal{O}}(30)$ terms). With an effective grid resolution $l_textrm{max}$ (number of particles/grid points $N=l_textrm{max}^3$), we present a bispectrum estimator which requires only ${cal O}(n_textrm{max} times l_textrm{max}^3)$ operations, along with a corresponding method for direct bispectrum reconstruction. This method is extended to the trispectrum revealing an estimator which requires only ${cal O}(n_textrm{max}^{4/3} times l_textrm{max}^3)$ operations. The complexity in calculating the trispectrum in this method is now involved in the original decomposition and orthogonalisation process which need only be performed once for each model. However, for non-diagonal trispectra these processes present little extra difficulty and may be performed in ${cal O}(l_textrm{max}^4)$ operations. A discussion of how the methodology may be applied to the quadspectrum is also given. An efficient algorithm for the generation of arbitrary nonGaussian initial conditions for use in N-body codes using this separable approach is described. This prescription allows for the production of nonGaussian initial conditions for arbitrary bispectra and trispectra. A brief outline of the key issues involved in parameter estimation, particularly in the non-linear regime, is also given.
Innovations in Next-Generation Sequencing are enabling generation of DNA sequence data at ever faster rates and at very low cost. Large sequencing centers typically employ hundreds of such systems. Such high-throughput and low-cost generation of data underscores the need for commensurate acceleration in downstream computational analysis of the sequencing data. A fundamental step in downstream analysis is mapping of the reads to a long reference DNA sequence, such as a reference human genome. Sequence mapping is a compute-intensive step that accounts for more than 30% of the overall time of the GATK workflow. BWA-MEM is one of the most widely used tools for sequence mapping and has tens of thousands of users. In this work, we focus on accelerating BWA-MEM through an efficient architecture aware implementation, while maintaining identical output. The volume of data requires distributed computing environment, usually deploying multicore processors. Since the application can be easily parallelized for distributed memory systems, we focus on performance improvements on a single socket multicore processor. BWA-MEM run time is dominated by three kernels, collectively responsible for more than 85% of the overall compute time. We improved the performance of these kernels by 1) improving cache reuse, 2) simplifying the algorithms, 3) replacing small fragmented memory allocations with a few large contiguous ones, 4) software prefetching, and 5) SIMD utilization wherever applicable - and massive reorganization of the source code enabling these improvements. As a result, we achieved nearly 2x, 183x, and 8x speedups on the three kernels, respectively, resulting in up to 3.5x and 2.4x speedups on end-to-end compute time over the original BWA-MEM on single thread and single socket of Intel Xeon Skylake processor. To the best of our knowledge, this is the highest reported speedup over BWA-MEM.
Estimates of higher-order contributions for perturbative series in QCD, in view of their asymptotic nature, are delicate, though indispensable for a reliable error assessment in phenomenological applications. In this work, the Adler function and the scalar correlator are investigated, and models for Borel transforms of their perturbative series are constructed, which respect general constraints from the operator product expansion and the renormalisation group. As a novel ingredient, the QCD coupling is employed in the so-called $C$-scheme, which has certain advantages. For the Adler function, previous results obtained directly in the $overline{rm MS}$ scheme are supported. Corresponding results for the scalar correlation function are new. It turns out that the substantially larger perturbative corrections for the scalar correlator in $overline{rm MS}$ are dominantly due to this scheme choice, and can be largely reduced through more appropriate renormalisation schemes, which are easy to realise in the $C$-scheme.
LiteBIRD, the Lite (Light) satellite for the study of B-mode polarization and Inflation from cosmic background Radiation Detection, is a space mission for primordial cosmology and fundamental physics. JAXA selected LiteBIRD in May 2019 as a strategic large-class (L-class) mission, with its expected launch in the late 2020s using JAXAs H3 rocket. LiteBIRD plans to map the cosmic microwave background (CMB) polarization over the full sky with unprecedented precision. Its main scientific objective is to carry out a definitive search for the signal from cosmic inflation, either making a discovery or ruling out well-motivated inflationary models. The measurements of LiteBIRD will also provide us with an insight into the quantum nature of gravity and other new physics beyond the standard models of particle physics and cosmology. To this end, LiteBIRD will perform full-sky surveys for three years at the Sun-Earth Lagrangian point L2 for 15 frequency bands between 34 and 448 GHz with three telescopes, to achieve a total sensitivity of 2.16 micro K-arcmin with a typical angular resolution of 0.5 deg. at 100GHz. We provide an overview of the LiteBIRD project, including scientific objectives, mission requirements, top-level system requirements, operation concept, and expected scientific outcomes.
We consider the effect of phase backaction on the correlator $langle I(t), I(t+tau )rangle$ for the output signal $I(t)$ from continuous measurement of a qubit. We demonstrate that the interplay between informational and phase backactions in the presence of Rabi oscillations can lead to the correlator becoming larger than 1, even though $|langle Irangle|leq 1$. The correlators can be calculated using the generalized collapse recipe which we validate using the quantum Bayesian formalism. The recipe can be further generalized to the case of multi-time correlators and arbitrary number of detectors, measuring non-commuting qubit observables. The theory agrees well with experimental results for continuous measurement of a transmon qubit. The experimental correlator exceeds the bound of 1 for a sufficiently large angle between the amplified and informational quadratures, causing the phase backaction. The demonstrated effect can be used to calibrate the quadrature misalignment.