Do you want to publish a course? Click here

The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture

292   0   0.0 ( 0 )
 Added by Alessandro Lonardo
 Publication date 2012
and research's language is English




Ask ChatGPT about the research

One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network between processors tiles. In this paper, we present a configurable and scalable architecture, based on our Distributed Network Processor (DNP) IP Library, targeting systems ranging from single MPSoCs to massive HPC platforms. The DNP provides inter-tile services for both on-chip and off-chip communications with a uniform RDMA style API, over a multi-dimensional direct network with a (possibly) hybrid topology.



rate research

Read More

The ever-increasing computation complexity of fast-growing Deep Neural Networks (DNNs) has requested new computing paradigms to overcome the memory wall in conventional Von Neumann computing architectures. The emerging Computing-In-Memory (CIM) architecture has been a promising candidate to accelerate neural network computing. However, the data movement between CIM arrays may still dominate the total power consumption in conventional designs. This paper proposes a flexible CIM processor architecture named Domino to enable stream computing and local data access to significantly reduce the data movement energy. Meanwhile, Domino employs tailored distributed instruction scheduling within Network-on-Chip (NoC) to implement inter-memory-computing and attain mapping flexibility. The evaluation with prevailing CNN models shows that Domino achieves 1.15-to-9.49$times$ power efficiency over several state-of-the-art CIM accelerators and improves the throughput by 1.57-to-12.96$times$.
109 - Hailong Zhou , Yuhe Zhao , Xu Wang 2019
Photonic signal processing is essential in the optical communication and optical computing. Numerous photonic signal processors have been proposed, but most of them exhibit limited reconfigurability and automaticity. A feature of fully automatic implementation and intelligent response is highly desirable for the multipurpose photonic signal processors. Here, we report and experimentally demonstrate a fully self-learning and reconfigurable photonic signal processor based on an optical neural network chip. The proposed photonic signal processor is capable of performing various functions including multichannel optical switching, optical multiple-input-multiple-output descrambler and tunable optical filter. All the functions are achieved by complete self-learning. Our demonstration suggests great potential for chip-scale fully programmable optical signal processing with artificial intelligence.
Superconducting digital circuits are a promising approach to build packaged-level integrated systems with high energy-efficiency and computational density. In such systems, performance of the data link between chips mounted on a multi-chip module (MCM) is a critical driver of performance. In this work we report a synchronous data link using Reciprocal Quantum Logic (RQL) enabled by resonant clock distribution on the chip and on the MCM carrier. The simple physical link has only four Josephson junctions and 3 fJ/bit dissipation, including a 300 W/W cooling overhead. The driver produces a signal with 35 GHz analog bandwidth and connects to a single-ended receiver via 20 $Omega$ Nb Passive Transmission Line (PTL). To validate this link, we have designed, fabricated and tested two 32$times$32 mm$^2$ MCMs with eight 5$times$5 mm$^2$ chips connected serially and powered with a meander clock, and with four 10$times$10 mm$^2$ chips powered with a 2 GHz resonant clock. The meander clock MCM validates performance of the data link components, and achieved 5.4 dB AC bias margin with no degradation relative to individual chip test. The resonator MCM validates synchronization between chips, with a measured AC bias margin up to 4.8 dB between two chips. The resonator MCM is capable of powering circuits of 4 million Josephson junctions across the four chips with a projected 10 Gbps serial data rate.
Manycore System-on-Chip include an increasing amount of processing elements and have become an important research topic for improvements of both hardware and software. While research can be conducted using system simulators, prototyping requires a variety of components and is very time consuming. With the Open Tiled Manycore System-on-Chip (OpTiMSoC) we aim at building such an environment for use in our and other research projects as prototyping platform. This paper describes the project goals and aspects of OpTiMSoC and summarizes the current status and ideas.
Ubiquitous sensing devices frequently disseminate their data between them. The use of a distributed event-based system that decouples publishers of subscribers arises as an ideal candidate to implement the dissemination process. In this paper, we present a network architecture which merges the network and overlay layers of typical structured event-based systems. Directional Random Walks (DRWs) are used for the construction of this merged layer. Our first results show that DRWs are suitable to balance the load using a few nodes in the network to construct the dissemination path. As future work, we propose to study the properties of this new layer and to work on the design of Bloom filters to manage broker nodes.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا