Do you want to publish a course? Click here

Simulating spin systems on IANUS, an FPGA-based computer

147   0   0.0 ( 0 )
 Added by Andrea Maiorano
 Publication date 2007
and research's language is English




Ask ChatGPT about the research

We describe the hardwired implementation of algorithms for Monte Carlo simulations of a large class of spin models. We have implemented these algorithms as VHDL codes and we have mapped them onto a dedicated processor based on a large FPGA device. The measured performance on one such processor is comparable to O(100) carefully programmed high-end PCs: it turns out to be even better for some selected spin models. We describe here codes that we are currently executing on the IANUS massively parallel FPGA-based system.



rate research

Read More

Dedicated machines designed for specific computational algorithms can outperform conventional computers by several orders of magnitude. In this note we describe {it Ianus}, a new generation FPGA based machine and its basic features: hardware integration and wide reprogrammability. Our goal is to build a machine that can fully exploit the performance potential of new generation FPGA devices. We also plan a software platform which simplifies its programming, in order to extend its intended range of application to a wide class of interesting and computationally demanding problems. The decision to develop a dedicated processor is a complex one, involving careful assessment of its performance lead, during its expected lifetime, over traditional computers, taking into account their performance increase, as predicted by Moores law. We discuss this point in detail.
The entropy stabilized oxide Mg$_{0.2}$Co$_{0.2}$Ni$_{0.2}$Cu$_{0.2}$Zn$_{0.2}$O exhibits antiferromagnetic order and magnetic excitations, as revealed by recent neutron scattering experiments. This observation raises the question of the nature of spin wave excitations in such disordered systems. Here, we investigate theoretically the magnetic ground state and the spin-wave excitations using linear spin-wave theory in combination with the supercell approximation to take into account the extreme disorder in this magnetic system. We find that the experimentally observed antiferromagnetic structure can be stabilized by a rhombohedral distortion together with large second nearest neighbor interactions. Our calculations show that the spin-wave spectrum consists of a well-defined low-energy coherent spectrum in the background of an incoherent continuum that extends to higher energies.
Single computation engines have become a popular design choice for FPGA-based convolutional neural networks (CNNs) enabling the deployment of diverse models without fabric reconfiguration. This flexibility, however, often comes with significantly reduced performance on memory-bound layers and resource underutilisation due to suboptimal mapping of certain layers on the engines fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. To minimise the negative impact of limited bandwidth on memory-bound layers, we present a novel hardware component that enables the on-chip on-the-fly generation of weights. We further introduce an input selective processing element (PE) design that balances the load between PEs on suboptimally mapped layers. Finally, we present unzipFPGA, a framework to train on-the-fly models and traverse the design space to select the highest performing CNN engine configuration. Quantitative evaluation shows that unzipFPGA yields an average speedup of 2.14x and 71% over optimised status-quo and pruned CNN engines under constrained bandwidth and up to 3.69x higher performance density over the state-of-the-art FPGA-based CNN accelerators.
QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA. To achieve the very high packaging density of 26 TFlops per rack a new water cooling concept has been developed and successfully realized. In this paper we give an overview of the architecture and highlight some important technical details of the system. Furthermore, we provide initial performance results and report on the installation of 8 QPACE racks providing an aggregate peak performance of 200 TFlops.
71 - Qi Ni , Fei Wang , Ziwei Zhao 2019
Image feature extraction and matching is a fundamental but computation intensive task in machine vision. This paper proposes a novel FPGA-based embedded system to accelerate feature extraction and matching. It implements SURF feature point detection and BRIEF feature descriptor construction and matching. For binocular stereo vision, feature matching includes both tracking matching and stereo matching, which simultaneously provide feature point correspondences and parallax information. Our system is evaluated on a ZYNQ XC7Z045 FPGA. The result demonstrates that it can process binocular video data at a high frame rate (640$times$480 @ 162fps). Moreover, an extensive test proves our system has robustness for image compression, blurring and illumination.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا