Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture

582 0 0.0 ( 0 )

Download Cite

Added by Alessandro Lonardo

Publication date 2012

fields Informatics Engineering

and research's language is English

Authors Andrea Biagioni - Francesca Lo Cicero - Alessandro Lonardo

Hardware Architecture Networking and Internet Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network between processors tiles. In this paper, we present a configurable and scalable architecture, based on our Distributed Network Processor (DNP) IP Library, targeting systems ranging from single MPSoCs to massive HPC platforms. The DNP provides inter-tile services for both on-chip and off-chip communications with a uniform RDMA style API, over a multi-dimensional direct network with a (possibly) hybrid topology.

rate research

Domino: A Tailored Network-on-Chip Architecture to Enable Highly Localized Inter- and Intra-Memory DNN Computing

72 - Kaining Zhou , Yangshuo He , Rui Xiao 2021

The ever-increasing computation complexity of fast-growing Deep Neural Networks (DNNs) has requested new computing paradigms to overcome the memory wall in conventional Von Neumann computing architectures. The emerging Computing-In-Memory (CIM) architecture has been a promising candidate to accelerate neural network computing. However, the data movement between CIM arrays may still dominate the total power consumption in conventional designs. This paper proposes a flexible CIM processor architecture named Domino to enable stream computing and local data access to significantly reduce the data movement energy. Meanwhile, Domino employs tailored distributed instruction scheduling within Network-on-Chip (NoC) to implement inter-memory-computing and attain mapping flexibility. The evaluation with prevailing CNN models shows that Domino achieves 1.15-to-9.49$times$ power efficiency over several state-of-the-art CIM accelerators and improves the throughput by 1.57-to-12.96$times$.

Hardware Architecture

Self-learning photonic signal processor with an optical neural network chip

109 - Hailong Zhou , Yuhe Zhao , Xu Wang 2019

Photonic signal processing is essential in the optical communication and optical computing. Numerous photonic signal processors have been proposed, but most of them exhibit limited reconfigurability and automaticity. A feature of fully automatic implementation and intelligent response is highly desirable for the multipurpose photonic signal processors. Here, we report and experimentally demonstrate a fully self-learning and reconfigurable photonic signal processor based on an optical neural network chip. The proposed photonic signal processor is capable of performing various functions including multichannel optical switching, optical multiple-input-multiple-output descrambler and tunable optical filter. All the functions are achieved by complete self-learning. Our demonstration suggests great potential for chip-scale fully programmable optical signal processing with artificial intelligence.

Signal Processing Optics

Synchronous Chip-to-Chip Communication with a Multi-Chip Resonator Clock Distribution Network

86 - Jonathan Egan , Max Nielsen , Joshua Strong 2021

Superconducting digital circuits are a promising approach to build packaged-level integrated systems with high energy-efficiency and computational density. In such systems, performance of the data link between chips mounted on a multi-chip module (MCM) is a critical driver of performance. In this work we report a synchronous data link using Reciprocal Quantum Logic (RQL) enabled by resonant clock distribution on the chip and on the MCM carrier. The simple physical link has only four Josephson junctions and 3 fJ/bit dissipation, including a 300 W/W cooling overhead. The driver produces a signal with 35 GHz analog bandwidth and connects to a single-ended receiver via 20 $Omega$ Nb Passive Transmission Line (PTL). To validate this link, we have designed, fabricated and tested two 32$times$32 mm$^2$ MCMs with eight 5$times$5 mm$^2$ chips connected serially and powered with a meander clock, and with four 10$times$10 mm$^2$ chips powered with a 2 GHz resonant clock. The meander clock MCM validates performance of the data link components, and achieved 5.4 dB AC bias margin with no degradation relative to individual chip test. The resonator MCM validates synchronization between chips, with a measured AC bias margin up to 4.8 dB between two chips. The resonator MCM is capable of powering circuits of 4 million Josephson junctions across the four chips with a projected 10 Gbps serial data rate.

Applied Physics Superconductivity

Open Tiled Manycore System-on-Chip

386 - Stefan Wallentowitz , Philipp Wagner , Michael Tempelmeier 2013

Manycore System-on-Chip include an increasing amount of processing elements and have become an important research topic for improvements of both hardware and software. While research can be conducted using system simulators, prototyping requires a variety of components and is very time consuming. With the Open Tiled Manycore System-on-Chip (OpTiMSoC) we aim at building such an environment for use in our and other research projects as prototyping platform. This paper describes the project goals and aspects of OpTiMSoC and summarizes the current status and ideas.

Hardware Architecture

A Network Architecture for Distributed Event Based Systems in an Ubiquitous Sensing Scenario

416 - Cristina Mu~noz , Pierre Leone 2014

Ubiquitous sensing devices frequently disseminate their data between them. The use of a distributed event-based system that decouples publishers of subscribers arises as an ideal candidate to implement the dissemination process. In this paper, we present a network architecture which merges the network and overlay layers of typical structured event-based systems. Directional Random Walks (DRWs) are used for the construction of this merged layer. Our first results show that DRWs are suitable to balance the load using a few nodes in the network to construct the dissemination path. As future work, we propose to study the properties of this new layer and to work on the design of Bloom filters to manage broker nodes.

Distributed Parallel and Cluster Computing Networking and Internet Architecture

comments

Fetching comments

Yarmouk Private University

Additional details More universities

The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture

Ask ChatGPT about the research

No Arabic abstract

Read More