أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Kamran Karimi

Investigating the Performance of an Adiabatic Quantum Optimization Processor

243 - Kamran Karimi , Neil G. Dickson , Firas Hamze 2010

Adiabatic quantum optimization offers a new method for solving hard optimization problems. In this paper we calculate median adiabatic times (in seconds) determined by the minimum gap during the adiabatic quantum optimization for an NP-hard Ising spi n glass instance class with up to 128 binary variables. Using parameters obtained from a realistic superconducting adiabatic quantum processor, we extract the minimum gap and matrix elements using high performance Quantum Monte Carlo simulations on a large-scale Internet-based computing platform. We compare the median adiabatic times with the median running times of two classical solvers and find that, for the considered problem sizes, the adiabatic times for the simulated processor architecture are about 4 and 6 orders of magnitude shorter than the two classical solvers times. This shows that if the adiabatic time scale were to determine the computation time, adiabatic quantum optimization would be significantly superior to those classical solvers for median spin glass problems of at least up to 128 qubits. We also discuss important additional constraints that affect the performance of a realistic system.

فيزياء الكم الأنظمة المضطربة والشبكات العصبية الميكانيكا الإحصائية

A Performance Comparison of CUDA and OpenCL

152 - Kamran Karimi , Neil G. Dickson , 2010

CUDA and OpenCL are two different frameworks for GPU programming. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, we use complex, near-identical kernels from a Quantum Monte Carlo application to compare the performance of CUDA and OpenCL. We show that when using NVIDIA compiler tools, converting a CUDA kernel to an OpenCL kernel involves minimal modifications. Making such a kernel compile with ATIs build tools involves more modifications. Our performance tests measure and compare data transfer times to and from the GPU, kernel execution times, and end-to-end application execution times for both CUDA and OpenCL.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية الفيزياء الحسابية

Importance of Explicit Vectorization for CPU and GPU Software Performance

119 - Neil G. Dickson , Kamran Karimi , Firas Hamze 2010

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, a re less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.

النظم الموزعة والتوازية والحوسبة العنقودية الأداء الفيزياء الحسابية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد