No Arabic abstract
Micro- and nanosatellites have become popular platforms for a variety of commercial and scientific applications, but today are considered suitable mainly for short and low-priority space missions due to their low reliability. In part, this can be attributed to their reliance upon cheap, low-feature size, COTS components originally designed for embedded and mobile-market applications, for which traditional hardware-voting concepts are ineffective. Software-fault-tolerance concepts have been shown effective for such systems, but have largely been ignored by the space industry due to low maturity, as most have only been researched in theory. In practice, designers of payload instruments and miniaturized satellites are usually forced to sacrifice reliability in favor deliver the level of performance necessary for cutting-edge science and innovative commercial applications. Thus, we developed a software-fault-tolerance-approach based upon thread-level coarse-grain lockstep, which was validated using fault-injection. To offer strong long-term fault coverage, our architecture is implemented as tiled MPSoC on an FPGA, utilizing partial reconfiguration, as well as mixed criticality. This architecture can satisfy the high performance requirements of current and future scientific and commercial space missions at very low cost, while offering the strong fault-coverage guarantees necessary for platform control even for missions with a long duration. This architecture was developed for a 4-year ESA project. Together with two industrial partners, we are developing a prototype to then undergo radiation testing.
The CMOS integrated chips at advanced technology nodes are becoming more vulnerable to various sources of faults like manufacturing imprecisions, variations, aging, etc. Additionally, the intentional fault attacks (e.g., high power microwave, cybersecurity threats, etc.) and environmental effects (i.e., radiation) also pose reliability threats to integrated circuits. Though the traditional hardware redundancy-based techniques like Triple Modular Redundancy (TMR), Quadded (QL) Logic etc. mitigate the risk to some extent, they add huge hardware overhead and are not very effective. Truly polymorphic circuits that are inherently capable of achieving multiple functionalities in a limited footprint could enhance the faultresilience/recovery of the circuits with limited overhead. We demonstrate a novel crosstalk logic based polymorphic circuit approach to achieve compact and efficient fault resilient circuits. We show a range of polymorphic primitive gates and their usage in a functional unit. The functional unit is a single arithmetic circuit that is capable of delivering Multiplication/Sorting/Addition output depending on the control inputs. Using such polymorphic computing units in an ALU would imply that a correct path for functional output is possible even when 2/3rd of the ALU is damaged. Our comparison results with respect to existing polymorphic techniques and CMOS reveal 28% and 62% reduction in transistor count respectively for the same functionalities. In conjunction with fault detection algorithms, the proposed polymorphic circuit concept can be transformative for fault tolerant circuit design directions with minimum overhead.
The paper presents fault-tolerant (FT) labeling schemes for general graphs, as well as, improved FT routing schemes. For a given $n$-vertex graph $G$ and a bound $f$ on the number of faults, an $f$-FT connectivity labeling scheme is a distributed data structure that assigns each of the graph edges and vertices a short label, such that given the labels of the vertices $s$ and $t$, and at most $f$ failing edges $F$, one can determine if $s$ and $t$ are connected in $G setminus F$. The primary complexity measure is the length of the individual labels. Since their introduction by [Courcelle, Twigg, STACS 07], compact FT labeling schemes have been devised only for a limited collection of graph families. In this work, we fill in this gap by proposing two (independent) FT connectivity labeling schemes for general graphs, with a nearly optimal label length. This serves the basis for providing also FT approximate distance labeling schemes, and ultimately also routing schemes. Our main results for an $n$-vertex graph and a fault bound $f$ are: -- There is a randomized FT connectivity labeling scheme with a label length of $O(f+log n)$ bits, hence optimal for $f=O(log n)$. This scheme is based on the notion of cycle space sampling [Pritchard, Thurimella, TALG 11]. -- There is a randomized FT connectivity labeling scheme with a label length of $O(log^3 n)$ bits (independent of the number of faults $f$). This scheme is based on the notion of linear sketches of [Ahn et al., SODA 12]. -- For $kgeq 1$, there is a randomized routing scheme that routes a message from $s$ to $t$ in the presence of a set $F$ of faulty edges, with stretch $O(|F|^2 k)$ and routing tables of size $tilde{O}(f^3 n^{1/k})$. This significantly improves over the state-of-the-art bounds by [Chechik, ICALP 11], providing the first scheme with sub-linear FT labeling and routing schemes for general graphs.
Quantum computation promises significant computational advantages over classical computation for some problems. However, quantum hardware suffers from much higher error rates than in classical hardware. As a result, extensive quantum error correction is required to execute a useful quantum algorithm. The decoder is a key component of the error correction scheme whose role is to identify errors faster than they accumulate in the quantum computer and that must be implemented with minimum hardware resources in order to scale to the regime of practical applications. In this work, we consider surface code error correction, which is the most popular family of error correcting codes for quantum computing, and we design a decoder micro-architecture for the Union-Find decoding algorithm. We propose a three-stage fully pipelined hardware implementation of the decoder that significantly speeds up the decoder. Then, we optimize the amount of decoding hardware required to perform error correction simultaneously over all the logical qubits of the quantum computer. By sharing resources between logical qubits, we obtain a 67% reduction of the number of hardware units and the memory capacity is reduced by 70%. Moreover, we reduce the bandwidth required for the decoding process by a factor at least 30x using low-overhead compression algorithms. Finally, we provide numerical evidence that our optimized micro-architecture can be executed fast enough to correct errors in a quantum computer.
It was recently shown that a version of the greedy algorithm gives a construction of fault-tolerant spanners that is size-optimal, at least for vertex faults. However, the algorithm to construct this spanner is not polynomial-time, and the best-known polynomial time algorithm is significantly suboptimal. Designing a polynomial-time algorithm to construct (near-)optimal fault-tolerant spanners was given as an explicit open problem in the two most recent papers on fault-tolerant spanners ([Bodwin, Dinitz, Parter, Vassilevka Williams SODA 18] and [Bodwin, Patel PODC 19]). We give a surprisingly simple algorithm which runs in polynomial time and constructs fault-tolerant spanners that are extremely close to optimal (off by only a linear factor in the stretch) by modifying the greedy algorithm to run in polynomial time. To complement this result, we also give simple distributed constructions in both the LOCAL and CONGEST models.
Quantum error correction (QEC) is an essential step towards realising scalable quantum computers. Theoretically, it is possible to achieve arbitrarily long protection of quantum information from corruption due to decoherence or imperfect controls, so long as the error rate is below a threshold value. The two-dimensional surface code (SC) is a fault-tolerant error correction protocol} that has garnered considerable attention for actual physical implementations, due to relatively high error thresholds ~1%, and restriction to planar lattices with nearest-neighbour interactions. Here we show a necessary element for SC error correction: high-fidelity parity detection of two code qubits via measurement of a third syndrome qubit. The experiment is performed on a sub-section of the SC lattice with three superconducting transmon qubits, in which two independent outer code qubits are joined to a central syndrome qubit via two linking bus resonators. With all-microwave high-fidelity single- and two-qubit nearest-neighbour entangling gates, we demonstrate entanglement distributed across the entire sub-section by generating a three-qubit Greenberger-Horne-Zeilinger (GHZ) state with fidelity ~94%. Then, via high-fidelity measurement of the syndrome qubit, we deterministically entangle the otherwise un-coupled outer code qubits, in either an even or odd parity Bell state, conditioned on the syndrome state. Finally, to fully characterize this parity readout, we develop a new measurement tomography protocol to obtain a fidelity metric (90% and 91%). Our results reveal a straightforward path for expanding superconducting circuits towards larger networks for the SC and eventually a primitive logical qubit implementation.