Near real-time streaming analysis of big fusion data


Abstract in English

While experiments on fusion plasmas produce high-dimensional data time series with ever increasing magnitude and velocity, data analysis has been lagging behind this development. For example, many data analysis tasks are often performed in a manual, ad-hoc manner some time after an experiment. In this article we introduce the DELTA framework that facilitates near real-time streaming analysis of big and fast fusion data. By streaming measurement data from fusion experiments to a high-performance compute center, DELTA allows to perform demanding data analysis tasks in between plasma pulses. This article describe the modular and expandable software architecture of DELTA and presents performance benchmarks of its individual components as well as of entire workflows. Our focus is on the streaming analysis of ECEi data measured at KSTAR on NERSCs supercomputers and we routinely achieve data transfer rates of about 500 Megabyte per second. We show that a demanding turbulence analysis workload can be distributed among multiple GPUs and executes in under 5 minutes. We further discuss how DELTA uses modern database systems and container orchestration services to provide web-based real-time data visualization. For the case of ECEi data we demonstrate how data visualizations can be augmented with outputs from machine learning models. By providing session leaders and physics operators results of higher order data analysis using live visualization they may monitor the evolution of a long-pulse discharge in near real-time and may make more informed decision on how to configure the machine for the next shot.

Download