Implementation of high-performance, sub-microsecond deep neural networks on FPGAs for trigger applications


Abstract in English

Artificial neural networks are already widely used for physics analysis, but there are only few applications within low-level hardware triggers, and typically only with small networks. Modern high-end FPGAs offer Tera-scale arithmetic performance, and thereby provide a significant amount of operations per data set even for MHz-range data rates. We present a bottom-up approach of implementing typical neural network layers, in which we took both the special constraints that come from high-performance trigger systems, such as the ATLAS hardware trigger at the LHC, as well as an efficient implementation into account. By specifically designing each layer type to match our requirements, we could develop a framework that reaches 90 to 100% processing efficiency for large layers, requires only few extra resources for data flow and controlling, and offers latencies in the range of only tens to hundreds of nanoseconds for entire (deep) networks. Additionally, a toolkit was built around these optimized layer implementations, which facilitates the creation of the FPGA implementation of a trained NN model.

Download