ﻻ يوجد ملخص باللغة العربية
Background: Metabolomics datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. A cloud infrastructure with portable software tools can provide much needed resources enabling faster processing of much larger datasets than would be possible at any individual lab. The PhenoMeNal project has developed such an infrastructure, allowing users to run analyses on local or commercial cloud platforms. We have examined the computational scaling behaviour of the PhenoMeNal platform using four different implementations across 1-1000 virtual CPUs using two common metabolomics tools. Results: Our results show that data which takes up to 4 days to process on a standard desktop computer can be processed in just 10 min on the largest cluster. Improved runtimes come at the cost of decreased efficiency, with all platforms falling below 80% efficiency above approximately 1/3 of the maximum number of vCPUs. An economic analysis revealed that running on large scale cloud platforms is cost effective compared to traditional desktop systems. Conclusions: Overall, cloud implementations of PhenoMeNal show excellent scalability for standard metabolomics computing tasks on a range of platforms, making them a compelling choice for research computing in metabolomics.
Big data is gaining overwhelming attention since the last decade. Almost all the fields of science and technology have experienced a considerable impact from it. The cloud computing paradigm has been targeted for big data processing and mining in a m
This chapter introduces the state-of-the-art in the emerging area of combining High Performance Computing (HPC) with Big Data Analysis. To understand the new area, the chapter first surveys the existing approaches to integrating HPC with Big Data. Ne
In this work we present kiwiPy, a Python library designed to support robust message based communication for high-throughput, big-data, applications while being general enough to be useful wherever high-volumes of messages need to be communicated in a
With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random Forest (PRF) a
Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses that deal with confidential big