No Arabic abstract
Data analysis in HEP has often relied on batch systems and event loops; users are given a non-interactive interface to computing resources and consider data event-by-event. The Coffea-casa prototype analysis facility is an effort to provide users with alternate mechanisms to access computing resources and enable new programming paradigms. Instead of the command-line interface and asynchronous batch access, a notebook-based web interface and interactive computing is provided. Instead of writing event loops, the column-based Coffea library is used. In this paper, we describe the architectural components of the facility, the services offered to end-users, and how it integrates into a larger ecosystem for data access and authentication.
HEPCloud is rapidly becoming the primary system for provisioning compute resources for all Fermilab-affiliated experiments. In order to reliably meet the peak demands of the next generation of High Energy Physics experiments, Fermilab must plan to elastically expand its computational capabilities to cover the forecasted need. Commercial cloud and allocation-based High Performance Computing (HPC) resources both have explicit and implicit costs that must be considered when deciding when to provision these resources, and at which scale. In order to support such provisioning in a manner consistent with organizational business rules and budget constraints, we have developed a modular intelligent decision support system (IDSS) to aid in the automatic provisioning of resources spanning multiple cloud providers, multiple HPC centers, and grid computing federations. In this paper, we discuss the goals and architecture of the HEPCloud Facility, the architecture of the IDSS, and our early experience in using the IDSS for automated facility expansion both at Fermi and Brookhaven National Laboratory.
The emerging Internet of Things (IoT) is facing significant scalability and security challenges. On the one hand, IoT devices are weak and need external assistance. Edge computing provides a promising direction addressing the deficiency of centralized cloud computing in scaling massive number of devices. On the other hand, IoT devices are also relatively vulnerable facing malicious hackers due to resource constraints. The emerging blockchain and smart contracts technologies bring a series of new security features for IoT and edge computing. In this paper, to address the challenges, we design and prototype an edge-IoT framework named EdgeChain based on blockchain and smart contracts. The core idea is to integrate a permissioned blockchain and the internal currency or coin system to link the edge cloud resource pool with each IoT device account and resource usage, and hence behavior of the IoT devices. EdgeChain uses a credit-based resource management system to control how much resource IoT devices can obtain from edge servers, based on pre-defined rules on priority, application types and past behaviors. Smart contracts are used to enforce the rules and policies to regulate the IoT device behavior in a non-deniable and automated manner. All the IoT activities and transactions are recorded into blockchain for secure data logging and auditing. We implement an EdgeChain prototype and conduct extensive experiments to evaluate the ideas. The results show that while gaining the security benefits of blockchain and smart contracts, the cost of integrating them into EdgeChain is within a reasonable and acceptable range.
The AEI 10 m prototype interferometer facility is currently being constructed at the Albert Einstein Institute in Hannover, Germany. It aims to perform experiments for future gravitational wave detectors using advanced techniques. Seismically isolated benches are planned to be interferometrically interconnected and stabilized, forming a low-noise testbed inside a 100 m^3 ultra-high vacuum system. A well-stabilized high power laser will perform differential position readout of 100 g test masses in a 10 m suspended arm-cavity enhanced Michelson interferometer at the crossover of measurement (shot) noise and backaction (quantum radiation pressure) noise, the so-called Standard Quantum Limit (SQL). Such a sensitivity enables experiments in the highly topical field of macroscopic quantum mechanics. In this article we introduce the experimental facility and describe the methods employed, technical details of subsystems will be covered in future papers.
The Atacama Large mm and sub-mm Array (ALMA) radio observatory is one of the worlds largest astronomical projects. After the very successful conclusion of the first observation cycles Early Science Cycles 0 and 1, the ALMA project can report many successes and lessons learned. The science data taken interleaved with commissioning tests for the still continuing addition of new capabilities has already resulted in numerous publications in high-profile journals. The increasing data volume and complexity are challenging but under control. The radio-astronomical data analysis package Common Astronomy Software Applications (CASA) has played a crucial role in this effort. This article describes the implementation of the ALMA data quality assurance system, in particular the level 2 which is based on CASA, and the lessons learned.
The interconnect is one of the most critical components in large scale computing systems, and its impact on the performance of applications is going to increase with the system size. In this paper, we will describe Slingshot, an interconnection network for large scale computing systems. Slingshot is based on high-radix switches, which allow building exascale and hyperscale datacenters networks with at most three switch-to-switch hops. Moreover, Slingshot provides efficient adaptive routing and congestion control algorithms, and highly tunable traffic classes. Slingshot uses an optimized Ethernet protocol, which allows it to be interoperable with standard Ethernet devices while providing high performance to HPC applications. We analyze the extent to which Slingshot provides these features, evaluating it on microbenchmarks and on several applications from the datacenter and AI worlds, as well as on HPC applications. We find that applications running on Slingshot are less affected by congestion compared to previous generation networks.