ترغب بنشر مسار تعليمي؟ اضغط هنا

575 - Gushu Li , Anbang Wu , Yunong Shi 2021
The quantum simulation kernel is an important subroutine appearing as a very long gate sequence in many quantum programs. In this paper, we propose Paulihedral, a block-wise compiler framework that can deeply optimize this subroutine by exploiting hi gh-level program structure and optimization opportunities. Paulihedral first employs a new Pauli intermediate representation that can maintain the high-level semantics and constraints in quantum simulation kernels. This naturally enables new large-scale optimizations that are hard to implement at the low gate-level. In particular, we propose two technology-independent instruction scheduling passes, and two technology-dependent code optimization passes which reconcile the circuit synthesis, gate cancellation, and qubit mapping stages of the compiler. Experimental results show that Paulihedral can outperform state-of-the-art compiler infrastructures in a wide-range of applications on both near-term superconducting quantum processors and future fault-tolerant quantum computers.
273 - Gushu Li , Yunong Shi , 2021
Computational chemistry is the leading application to demonstrate the advantage of quantum computing in the near term. However, large-scale simulation of chemical systems on quantum computers is currently hindered due to a mismatch between the comput ational resource needs of the program and those available in todays technology. In this paper we argue that significant new optimizations can be discovered by co-designing the application, compiler, and hardware. We show that multiple optimization objectives can be coordinated through the key abstraction layer of Pauli strings, which are the basic building blocks of computational chemistry programs. In particular, we leverage Pauli strings to identify critical program components that can be used to compress program size with minimal loss of accuracy. We also leverage the structure of Pauli string simulation circuits to tailor a novel hardware architecture and compiler, leading to significant execution overhead reduction by up to 99%. While exploiting the high-level domain knowledge reveals significant optimization opportunities, our hardware/software framework is not tied to a particular program instance and can accommodate the full family of computational chemistry problems with such structure. We believe the co-design lessons of this study can be extended to other domains and hardware technologies to hasten the onset of quantum advantage.
179 - Yuke Wang , Boyuan Feng , Gushu Li 2020
As the emerging trend of graph-based deep learning, Graph Neural Networks (GNNs) excel for their capability to generate high-quality node feature vectors (embeddings). However, the existing one-size-fits-all GNN implementations are insufficient to ca tch up with the evolving GNN architectures, the ever-increasing graph sizes, and the diverse node embedding dimensionalities. To this end, we propose textbf{GNNAdvisor}, an adaptive and efficient runtime system to accelerate various GNN workloads on GPU platforms. First, GNNAdvisor explores and identifies several performance-relevant features from both the GNN model and the input graph, and uses them as a new driving force for GNN acceleration. Second, GNNAdvisor implements a novel and highly-efficient 2D workload management, tailored for GNN computation to improve GPU utilization and performance under different application settings. Third, GNNAdvisor capitalizes on the GPU memory hierarchy for acceleration by gracefully coordinating the execution of GNNs according to the characteristics of the GPU memory structure and GNN workloads. Furthermore, to enable automatic runtime optimization, GNNAdvisor incorporates a lightweight analytical model for an effective design parameter search. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library ($3.02times$ faster on average) and NeuGraph (up to $4.10times$ faster), on mainstream GNN architectures across various datasets.
153 - Gushu Li , Yufei Ding , Yuan Xie 2019
More computational resources (i.e., more physical qubits and qubit connections) on a superconducting quantum processor not only improve the performance but also result in more complex chip architecture with lower yield rate. Optimizing both of them s imultaneously is a difficult problem due to their intrinsic trade-off. Inspired by the application-specific design principle, this paper proposes an automatic design flow to generate simplified superconducting quantum processor architecture with negligible performance loss for different quantum programs. Our architecture-design-oriented profiling method identifies program components and patterns critical to both the performance and the yield rate. A follow-up hardware design flow decomposes the complicated design procedure into three subroutines, each of which focuses on different hardware components and cooperates with corresponding profiling results and physical constraints. Experimental results show that our design methodology could outperform IBMs general-purpose design schemes with better Pareto-optimal results.
117 - Gushu Li , Li Zhou , Nengkun Yu 2019
In this paper, we propose Proq, a runtime assertion scheme for testing and debugging quantum programs on a quantum computer. The predicates in Proq are represented by projections (or equivalently, closed subspaces of the state space), following Birkh off-von Neumann quantum logic. The satisfaction of a projection by a quantum state can be directly checked upon a small number of projective measurements rather than a large number of repeated executions. On the theory side, we rigorously prove that checking projection-based assertions can help locate bugs or statistically assure that the semantic function of the tested program is close to what we expect, for both exact and approximate quantum programs. On the practice side, we consider hardware constraints and introduce several techniques to transform the assertions, making them directly executable on the measurement-restricted quantum computers. We also propose to achieve simplified assertion implementation using local projection technique with soundness guaranteed. We compare Proq with existing quantum program assertions and demonstrate the effectiveness and efficiency of Proq by its applications to assert two ingenious quantum algorithms, the Harrow-Hassidim-Lloyd algorithm and Shors algorithm.
As a promising solution to boost the performance of distance-related algorithms (e.g., K-means and KNN), FPGA-based acceleration attracts lots of attention, but also comes with numerous challenges. In this work, we propose AccD, a compiler-based fram ework for accelerating distance-related algorithms on CPU-FPGA platforms. Specifically, AccD provides a Domain-specific Language to unify distance-related algorithms effectively, and an optimizing compiler to reconcile the benefits from both the algorithmic optimization on the CPU and the hardware acceleration on the FPGA. The output of AccD is a high-performance and power-efficient design that can be easily synthesized and deployed on mainstream CPU-FPGA platforms. Intensive experiments show that AccD designs achieve 31.42x speedup and 99.63x better energy efficiency on average over standard CPU-based implementations.
181 - Gushu Li , Yufei Ding , Yuan Xie 2019
To bridge the gap between limited hardware access and the huge demand for experiments for Noisy Intermediate-Scale Quantum (NISQ) computing system study, a simulator which can capture the modeling of both the quantum processor and its classical contr ol system to realize early-stage evaluation and design space exploration, is naturally invoked but still missing. This paper presents SANQ, a Simulation framework for Architecting NISQ computing system. SANQ consists of two components, 1) an optimized noisy quantum computing (QC) simulator with flexible error modeling accelerated by eliminating redundant computation, and 2) an architectural simulation infrastructure to construct behavior models for evaluating the control systems. SANQ is validated with existing NISQ quantum processor and control systems to ensure simulation accuracy. It can capture the variance on the QC device and simulate the timing behavior precisely (<1% and 10% error for various real control systems). Several potential applications are proposed to show that SANQ could benefit the future design of NISQ compiler, architecture, etc.
120 - Gushu Li , Yufei Ding , Yuan Xie 2018
Due to little consideration in the hardware constraints, e.g., limited connections between physical qubits to enable two-qubit gates, most quantum algorithms cannot be directly executed on the Noisy Intermediate-Scale Quantum (NISQ) devices. Dynamica lly remapping logical qubits to physical qubits in the compiler is needed to enable the two-qubit gates in the algorithm, which introduces additional operations and inevitably reduces the fidelity of the algorithm. Previous solutions in finding such remapping suffer from high complexity, poor initial mapping quality, and limited flexibility and controllability. To address these drawbacks mentioned above, this paper proposes a SWAP-based BidiREctional heuristic search algorithm SABRE, which is applicable to NISQ devices with arbitrary connections between qubits. By optimizing every search attempt,globally optimizing the initial mapping using a novel reverse traversal technique, introducing the decay effect to enable the trade-off between the depth and the number of gates of the entire algorithm, SABRE outperforms the best known algorithm with exponential speedup and comparable or better results on various benchmarks.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا