No Arabic abstract
As current Noisy Intermediate Scale Quantum (NISQ) devices suffer from decoherence errors, any delay in the instruction execution of quantum control microarchitecture can lead to the loss of quantum information and incorrect computation results. Hence, it is crucial for the control microarchitecture to issue quantum operations to the Quantum Processing Unit (QPU) in time. As in classical microarchitecture, parallelism in quantum programs needs to be exploited for speedup. However, three challenges emerge in the quantum scenario: 1) the quantum feedback control can introduce significant pipeline stall latency; 2) timing control is required for all quantum operations; 3) QPU requires a deterministic operation supply to prevent the accumulation of quantum errors. In this paper, we propose a novel control microarchitecture design to exploit Circuit Level Parallelism (CLP) and Quantum Operation Level Parallelism (QOLP). Firstly, we develop a Multiprocessor architecture to exploit CLP, which supports dynamic scheduling of different sub-circuits. This architecture can handle parallel feedback control and minimize the potential overhead that disrupts the timing control. Secondly, we propose a Quantum Superscalar approach that exploits QOLP by efficiently executing massive quantum instructions in parallel. Both methods issue quantum operations to QPU deterministically. In the benchmark test of a Shor syndrome measurement, a six-core implementation of our proposal achieves up to 2.59$times$ speedup compared with a single core. For various canonical quantum computing algorithms, our superscalar approach achieves an average of 4.04$times$ improvement over a baseline design. Finally, We perform a simultaneous randomized benchmarking (simRB) experiment on a real QPU using the proposed microarchitecture for validation.
This paper summarizes the idea of Subarray-Level Parallelism (SALP) in DRAM, which was published in ISCA 2012, and examines the works significance and future potential. Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the high latency of on-chip memory. Adding more banks to the system to mitigate this problem incurs high system cost. Our goal in this work is to achieve the benefits of increasing the number of banks with a low-cost approach. To this end, we propose three new mechanisms, SALP-1, SALP-2, and MASA (Multitude of Activated Subarrays), to reduce the serialization of different requests that go to the same bank. The key observation exploited by our mechanisms is that a modern DRAM bank is implemented as a collection of subarrays that operate largely independently while sharing few global peripheral structures. Our three proposed mechanisms mitigate the negative impact of bank serialization by overlapping different components of the bank access latencies of multiple requests that go to different subarrays within the same bank. SALP-1 requires no changes to the existing DRAM structure, and needs to only reinterpret some of the existing DRAM timing parameters. SALP-2 and MASA require only modest changes (< 0.15% area overhead) to the DRAM peripheral structures, which are much less design constrained than the DRAM core. Our evaluations show that SALP-1, SALP-2 and MASA significantly improve performance for both single-core systems (7%/13%/17%) and multi-core systems (15%/16%/20%), averaged across a wide range of workloads. We also demonstrate that our mechanisms can be combined with application-aware memory request scheduling in multicore systems to further improve performance and fairness.
Future universal quantum computers solving problems of practical relevance are expected to require at least $10^6$ qubits, which is a massive scale-up from the present numbers of less than 50 qubits operated together. Out of the different types of qubits, solid state qubits are considered to be viable candidates for this scale-up, but interfacing to and controlling such a large number of qubits is a complex challenge that has not been solved yet. One possibility to address this challenge is to use qubit control circuits located close to the qubits at cryogenic temperatures. In this work we evaluate the feasibility of this idea, taking as a reference the physical requirements of a two-electron spin qubit and the specifications of a standard 65 nm complementary metal-oxide-semiconductor (CMOS) process. Using principles and flows from electrical systems engineering we provide realistic estimates of the footprint and of the power consumption of a complete control-circuit architecture. Our results show that with further research it is possible to provide scalable electrical control in the vicinity of the qubit, with our concept.
The execution of quantum circuits on real systems has largely been limited to those which are simply time-ordered sequences of unitary operations followed by a projective measurement. As hardware platforms for quantum computing continue to mature in size and capability, it is imperative to enable quantum circuits beyond their conventional construction. Here we break into the realm of dynamic quantum circuits on a superconducting-based quantum system. Dynamic quantum circuits involve not only the evolution of the quantum state throughout the computation, but also periodic measurements of a subset of qubits mid-circuit and concurrent processing of the resulting classical information within timescales shorter than the execution times of the circuits. Using noisy quantum hardware, we explore one of the most fundamental quantum algorithms, quantum phase estimation, in its adaptive version, which exploits dynamic circuits, and compare the results to a non-adaptive implementation of the same algorithm. We demonstrate that the version of real-time quantum computing with dynamic circuits can offer a substantial and tangible advantage when noise and latency are sufficiently low in the system, opening the door to a new realm of available algorithms on real quantum systems.
In this work we analyze the implementation of a control-phase gate through the resonance between the $|11rangle$ and $|20rangle$ states of two statically coupled transmons. We find that there are many different controls for the transmon frequency that implement the same gate with fidelities around $99.8%$ ($T_1=T_2^{*}=17$ $mu$s) and $99.99%$ ($T_1=T_2^{*}=300$ $mu$s) within a time that approaches the theoretical limit. All controls can be brought to this accuracy by calibrating the waiting time and the destination frequency near the $|11rangle-|20rangle$ resonance. However, some controls, such as those based on the theory of dynamical invariants, are particularly attractive due to reduced leakage, robustness against decoherence, and their limited bandwidth.
In this work, we develop a method to design control pulses for fixed-frequency superconducting qubits coupled via tunable couplers based on local control theory, an approach commonly employed to steer chemical reactions. Local control theory provides an algorithm for the monotonic population transfer from a selected initial state to a desired final state of a quantum system through the on-the-fly shaping of an external pulse. The method, which only requires a unique forward time-propagation of the system wavefunction, can serve as starting point for additional refinements that lead to new pulses with improved properties. Among others, we propose an algorithm for the design of pulses that can transfer population in a reversible manner between given initial and final states of coupled fixed-frequency superconducting qubits.