Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Modern Data Formats for Big Bioinformatics Data Analytics

106 0 0.0 ( 0 )

Download Cite

Added by Shahzad Ahmed Mr.

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Shahzad Ahmed - M. Usman Ali - Javed Ferzund

Databases Computers and Society Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Next Generation Sequencing (NGS) technology has resulted in massive amounts of proteomics and genomics data. This data is of no use if it is not properly analyzed. ETL (Extraction, Transformation, Loading) is an important step in designing data analytics applications. ETL requires proper understanding of features of data. Data format plays a key role in understanding of data, representation of data, space required to store data, data I/O during processing of data, intermediate results of processing, in-memory analysis of data and overall time required to process data. Different data mining and machine learning algorithms require input data in specific types and formats. This paper explores the data formats used by different tools and algorithms and also presents modern data formats that are used on Big Data Platform. It will help researchers and developers in choosing appropriate data format to be used for a particular tool or algorithm.

rate research

Contextualizing Geometric Data Analysis and Related Data Analytics: A Virtual Microscope for Big Data Analytics

183 - Fionn Murtagh , Mohsen Farid 2016

The relevance and importance of contextualizing data analytics is described. Qualitative characteristics might form the context of quantitative analysis. Topics that are at issue include: contrast, baselining, secondary data sources, supplementary data sources, dynamic and heterogeneous data. In geometric data analysis, especially with the Correspondence Analysis platform, various case studies are both experimented with, and are reviewed. In such aspects as paradigms followed, and technical implementation, implicitly and explicitly, an important point made is the major relevance of such work for both burgeoning analytical needs and for new analytical areas including Big Data analytics, and so on. For the general reader, it is aimed to display and describe, first of all, the analytical outcomes that are subject to analysis here, and then proceed to detail the more quantitative outcomes that fully support the analytics carried out.

Artificial Intelligence Computers and Society

Identifying Dwarfs Workloads in Big Data Analytics

468 - Wanling Gao , Chunjie Luo , Jianfeng Zhan 2015

Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking. How can we construct a benchmark suite using a minimum set of units of computation to represent diversity of big data analytics workloads? Big data dwarfs are abstractions of extracting frequently appearing operations in big data computing. One dwarf represents one unit of computation, and big data workloads are decomposed into one or more dwarfs. Furthermore, dwarfs workloads rather than vast real workloads are more cost-efficient and representative to evaluate big data systems. In this paper, we extensively investigate six most important or emerging application domains i.e. search engine, social network, e-commerce, multimedia, bioinformatics and astronomy. After analyzing forty representative algorithms, we single out eight dwarfs workloads in big data analytics other than OLAP, which are linear algebra, sampling, logic operations, transform operations, set operations, graph operations, statistic operations and sort.

Databases

Industrial Big Data Analytics: Challenges, Methodologies, and Applications

72 - JunPing Wang , WenSheng Zhang , YouKang Shi 2018

While manufacturers have been generating highly distributed data from various systems, devices and applications, a number of challenges in both data management and data analysis require new approaches to support the big data era. These challenges for industrial big data analytics is real-time analysis and decision-making from massive heterogeneous data sources in manufacturing space. This survey presents new concepts, methodologies, and applications scenarios of industrial big data analytics, which can provide dramatic improvements in velocity and veracity problem solving. We focus on five important methodologies of industrial big data analytics: 1) Highly distributed industrial data ingestion: access and integrate to highly distributed data sources from various systems, devices and applications; 2) Industrial big data repository: cope with sampling biases and heterogeneity, and store different data formats and structures; 3) Large-scale industrial data management: organizes massive heterogeneous data and share large-scale data; 4) Industrial data analytics: track data provenance, from data generation through data preparation; 5) Industrial data governance: ensures data trust, integrity and security. For each phase, we introduce to current research in industries and academia, and discusses challenges and potential solutions. We also examine the typical applications of industrial big data, including smart factory visibility, machine fleet, energy management, proactive maintenance, and just in time supply chain. These discussions aim to understand the value of industrial big data. Lastly, this survey is concluded with a discussion of open problems and future directions.

Databases

Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads

282 - Parmita Mehta , Sven Dorkenwald , Dongfang Zhao 2016

Scientific discoveries are increasingly driven by analyzing large volumes of image data. Many new libraries and specialized database management systems (DBMSs) have emerged to support such tasks. It is unclear, however, how well these systems support real-world image analysis use cases, and how performant are the image analytics tasks implemented on top of such systems. In this paper, we present the first comprehensive evaluation of large-scale image analysis systems using two real-world scientific image data processing use cases. We evaluate five representative systems (SciDB, Myria, Spark, Dask, and TensorFlow) and find that each of them has shortcomings that complicate implementation or hurt performance. Such shortcomings lead to new research opportunities in making large-scale image analysis both efficient and easy to use.

Databases

ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics

124 - Pengfei Liu 2021

With new emerging technologies, such as satellites and drones, archaeologists collect data over large areas. However, it becomes difficult to process such data in time. Archaeological data also have many different formats (images, texts, sensor data) and can be structured, semi-structured and unstructured. Such variety makes data difficult to collect, store, manage, search and analyze effectively. A few approaches have been proposed, but none of them covers the full data lifecycle nor provides an efficient data management system. Hence, we propose the use of a data lake to provide centralized data stores to host heterogeneous data, as well as tools for data quality checking, cleaning, transformation, and analysis. In this paper, we propose a generic, flexible and complete data lake architecture. Our metadata management system exploits goldMEDAL, which is the most complete metadata model currently available. Finally, we detail the concrete implementation of this architecture dedicated to an archaeological project.

Databases

comments

Fetching comments

Syrian Virtual University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Modern Data Formats for Big Bioinformatics Data Analytics

Ask ChatGPT about the research

No Arabic abstract

Read More