No Arabic abstract
Our objective will be to integrate ML into Fermilab accelerator operations and furthermore provide an accessible framework which can also be used by a broad range of other accelerator systems with dynamic tuning needs. We will develop of real-time accelerator control using embedded ML on-chip hardware and fast communication between distributed systems in this proposal. We will demonstrate this technology for the Mu2e experiment by increasing the overall duty factor and uptime of the experiment through two synergistic projects. First, we will use deep reinforcement learning techniques to improve the performance of the regulation loop through guided optimization to provide stable proton beams extracted from the Delivery Ring to the Mu2e experiment. This requires the development of a digital twin of the system to model the accelerator and develop real-time ML algorithms. Second, we will use de-blending techniques to disentangle and classify overlapping beam losses in the Main Injector and Recycler Ring to reduce overall beam downtime in each machine. This ML model will be deployed within a semi-autonomous operational mode. Both applications require processing at the millisecond scale and will share similar ML-in-hardware techniques and beam instrumentation readout technology. A collaboration between Fermilab and Northwestern University will pull together the talents and resources of accelerator physicists, beam instrumentation engineers, embedded system architects, FPGA board design experts, and ML experts to solve complex real-time accelerator controls challenges which will enhance the physics program. More broadly, the framework developed for Accelerator Real-time Edge AI Distributed Systems (READS) can be applied to future projects as the accelerator complex is upgraded for the PIP-II and DUNE era.
In this paper, beam diagnostic and monitoring tools developed by the MAX IV Operations Group are discussed. In particular, new beam position monitoring and accelerator tunes visualization software tools, as well as tools that directly influence the beam quality and stability are introduced. An availability and downtime monitoring application is also presented.
Recently, the Turkic Accelerator Complex (TAC) is proposed as a regional facility for accelerator based fundamental and applied research. The complex will include linac on ring type electron-positron collider as a phi, charm and tau factory, linac based free electron laser (FEL), ring based third generation synchrotron radiation (SR) source and a few GeV proton accelerator. Preliminary estimations show that integral luminosity of hundred inverse femto-barns per year can be achieved for factory options. The FEL facility is planned to obtain laser beam between IR and soft X-ray region. In addition, SR facility will produce photon beams in UV and X-ray region. The proton accelerator will give opportunity to produce muon and neutron beams for applied research. The current status of the conceptual study of the complex is presented.
We describe a method for precisely regulating the gradient magnet power supply at the Fermilab Booster accelerator complex using a neural network trained via reinforcement learning. We demonstrate preliminary results by training a surrogate machine-learning model on real accelerator data to emulate the Booster environment, and using this surrogate model in turn to train the neural network for its regulation task. We additionally show how the neural networks to be deployed for control purposes may be compiled to execute on field-programmable gate arrays. This capability is important for operational stability in complicated environments such as an accelerator facility.
Safety-critical distributed cyber-physical systems (CPSs) have been found in a wide range of applications. Notably, they have displayed a great deal of utility in intelligent transportation, where autonomous vehicles communicate and cooperate with each other via a high-speed communication network. Such systems require an ability to identify maneuvers in real-time that cause dangerous circumstances and ensure the implementation always meets safety-critical requirements. In this paper, we propose a real-time decentralized reachability approach for safety verification of a distributed multi-agent CPS with the underlying assumption that all agents are time-synchronized with a low degree of error. In the proposed approach, each agent periodically computes its local reachable set and exchanges this reachable set with the other agents with the goal of verifying the system safety. Our method, implemented in Java, takes advantages of the timing information and the reachable set information that are available in the exchanged messages to reason about the safety of the whole system in a decentralized manner. Any particular agent can also perform local safety verification tasks based on their local clocks by analyzing the messages it receives. We applied the proposed method to verify, in real-time, the safety properties of a group of quadcopters performing a distributed search mission.
The ubiquitous use of IoT and machine learning applications is creating large amounts of data that require accurate and real-time processing. Although edge-based smart data processing can be enabled by deploying pretrained models, the energy and memory constraints of edge devices necessitate distributed deep learning between the edge and the cloud for complex data. In this paper, we propose a distributed AI system to exploit both the edge and the cloud for training and inference. We propose a new architecture, MEANet, with a main block, an extension block, and an adaptive block for the edge. The inference process can terminate at either the main block, the extension block, or the cloud. The MEANet is trained to categorize inputs into easy/hard/complex classes. The main block identifies instances of easy/hard classes and classifies easy classes with high confidence. Only data with high probabilities of belonging to hard classes would be sent to the extension block for prediction. Further, only if the neural network at the edge shows low confidence in the prediction, the instance is considered complex and sent to the cloud for further processing. The training technique lends to the majority of inference on edge devices while going to the cloud only for a small set of complex jobs, as determined by the edge. The performance of the proposed system is evaluated via extensive experiments using modified models of ResNets and MobileNetV2 on CIFAR-100 and ImageNet datasets. The results show that the proposed distributed model has improved accuracy and energy consumption, indicating its capacity to adapt.