ﻻ يوجد ملخص باللغة العربية
This chapter introduces the state-of-the-art in the emerging area of combining High Performance Computing (HPC) with Big Data Analysis. To understand the new area, the chapter first surveys the existing approaches to integrating HPC with Big Data. Next, the chapter introduces several optimization solutions that focus on how to minimize the data transfer time from computation-intensive applications to analysis-intensive applications as well as minimizing the end-to-end time-to-solution. The solutions utilize SDN to adaptively use both high speed interconnect network and high performance parallel file systems to optimize the application performance. A computational framework called DataBroker is designed and developed to enable a tight integration of HPC with data analysis. Multiple types of experiments have been conducted to show different performance issues in both message passing and parallel file systems and to verify the effectiveness of the proposed research approaches.
Several fundamental changes in technology indicate domain-specific hardware and software co-design is the only path left. In this context, architecture, system, data management, and machine learning communities pay greater attention to innovative big
Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of tra
Principal component analysis (PCA) is not only a fundamental dimension reduction method, but is also a widely used network anomaly detection technique. Traditionally, PCA is performed in a centralized manner, which has poor scalability for large dist
The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows t
Background: Metabolomics datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. A cloud infrastructure with portable software tools can provide much needed resourc