No Arabic abstract
We introduce PULSE, a sub-microsecond optical circuit-switched data centre network architecture controlled by distributed hardware schedulers. PULSE is a flat architecture that uses parallel passive coupler-based broadcast and select networks. We employ a novel transceiver architecture, for dynamic wavelength-timeslot selection, to achieve a reconfiguration time down to O(100ps), establishing timeslots of O(10ns). A novel scheduling algorithm that has a clock period of 2.3ns performs multiple iterations to maximize throughput, wavelength usage and reduce latency, enhancing the overall performance. In order to scale, the single-hop PULSE architecture uses sub-networks that are disjoint by using multiple transceivers for each node in 64 node racks. At the reconfiguration circuit duration (epoch = 120 ns), the scheduling algorithm is shown to achieve up to 93% throughput and 100% wavelength usage of 64 wavelengths, incurring an average latency that ranges from 0.7-1.2 microseconds with best-case 0.4 microsecond median and 5 microsecond tail latency, limited by the timeslot (20 ns) and epoch size (120 ns). We show how the 4096-node PULSE architecture allows up to 260k optical channels to be re-used across sub-networks achieving a capacity of 25.6 Pbps with an energy consumption of 85 pJ/bit.
Internet traffic continues to grow relentlessly, driven largely by increasingly high resolution video content. Although studies have shown that the majority of packets processed by Internet routers are pass-through traffic, they nonetheless have to be queued and routed at every hop in current networks, which unnecessarily adds substantial delays and processing costs. Such pass-through traffic can be better circuit-switched through the underlying optical transport network by means of pre-established circuits, which is possible in a unified packet and circuit switched network. In this paper, we propose a novel convex optimization framework based on a new destination-based multicommodity flow formulation for the allocation of circuits in such unified networks. In particular, we consider two deployment settings, one based on real-time traffic monitoring, and the other relying upon history-based traffic predictions. In both cases, we formulate global network optimization objectives as concave functions that capture the fair sharing of network capacity among competing traffic flows. The convexity of our problem formulations ensures globally optimal solutions.
Radiation sensors based on the heating effect of the absorbed radiation are typically relatively simple to operate and flexible in terms of the input frequency. Consequently, they are widely applied, for example, in gas detection, security, THz imaging, astrophysical observations, and medical applications. A new spectrum of important applications is currently emerging from quantum technology and especially from electrical circuits behaving quantum mechanically. This circuit quantum electrodynamics (cQED) has given rise to unprecedented single-photon detectors and a quantum computer supreme to the classical supercomputers in a certain task. Thermal sensors are appealing in enhancing these devices since they are not plagued by quantum noise and are smaller, simpler, and consume about six orders of magnitude less power than the commonly used traveling-wave parametric amplifiers. However, despite great progress in the speed and noise levels of thermal sensors, no bolometer to date has proven fast and sensitive enough to provide advantages in cQED. Here, we experimentally demonstrate a bolometer surpassing this threshold with a noise equivalent power of $30, rm{zW}/sqrt{rm{Hz}}$ on par with the current record while providing two-orders of magnitude shorter thermal time constant of 500 ns. Importantly, both of these characteristic numbers have been measured directly from the same device, which implies a faithful estimation of the calorimetric energy resolution of a single 30-GHz photon. These improvements stem from the utilization of a graphene monolayer as the active material with extremely low specific heat. The minimum demonstrated time constant of 200 ns falls greatly below the state-of-the-art dephasing times of roughly 100 {mu}s for superconducting qubits and meets the timescales of contemporary readout schemes thus enabling the utilization of thermal detectors in cQED.
The capacity of offloading data and control tasks to the network is becoming increasingly important, especially if we consider the faster growth of network speed when compared to CPU frequencies. In-network compute alleviates the host CPU load by running tasks directly in the network, enabling additional computation/communication overlap and potentially improving overall application performance. However, sustaining bandwidths provided by next-generation networks, e.g., 400 Gbit/s, can become a challenge. sPIN is a programming model for in-NIC compute, where users specify handler functions that are executed on the NIC, for each incoming packet belonging to a given message or flow. It enables a CUDA-like acceleration, where the NIC is equipped with lightweight processing elements that process network packets in parallel. We investigate the architectural specialties that a sPIN NIC should provide to enable high-performance, low-power, and flexible packet processing. We introduce PsPIN, a first open-source sPIN implementation, based on a multi-cluster RISC-V architecture and designed according to the identified architectural specialties. We investigate the performance of PsPIN with cycle-accurate simulations, showing that it can process packets at 400 Gbit/s for several use cases, introducing minimal latencies (26 ns for 64 B packets) and occupying a total area of 18.5 mm 2 (22 nm FDSOI).
This paper has been withdrawn by the authors.
This paper has been withdrawn by the authors.