Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Advancements in Big Data Processing in the ATLAS and CMS Experiments

377 0 0.0 ( 0 )

Download Cite

Added by Alexandre Vaniachine

Publication date 2013

fields Informatics Engineering

and research's language is English

Authors A.V. Vaniachine

Distributed Parallel and Cluster Computing Databases High Energy Physics - Experiment

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The ever-increasing volumes of scientific data present new challenges for distributed computing and Grid technologies. The emerging Big Data revolution drives exploration in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable six sigma production quality in petascale data processing on the Grid. The computing experience of the ATLAS and CMS experiments provides foundation for reliability engineering scaling up Grid technologies for data processing beyond the petascale.

rate research

Prototyping Virtual Data Technologies in ATLAS Data Challenge 1 Production

68 - A. Vaniachine , D. Malon 2003

For efficiency of the large production tasks distributed worldwide, it is essential to provide shared production management tools comprised of integratable and interoperable services. To enhance the ATLAS DC1 production toolkit, we introduced and tested a Virtual Data services component. For each major data transformation step identified in the ATLAS data processing pipeline (event generation, detector simulation, background pile-up and digitization, etc) the Virtual Data Cookbook (VDC) catalogue encapsulates the specific data transformation knowledge and the validated parameters settings that must be provided before the data transformation invocation. To provide for local-remote transparency during DC1 production, the VDC database server delivered in a controlled way both the validated production parameters and the templated production recipes for thousands of the event generation and detector simulation jobs around the world, simplifying the production management solutions.

Distributed Parallel and Cluster Computing Databases

The Future is Big Graphs! A Community View on Graph Processing Systems

113 - Sherif Sakr , Angela Bonifati , Hannes Voigt 2020

Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?

Distributed Parallel and Cluster Computing Databases

Unified System for Processing Real and Simulated Data in the ATLAS Experiment

94 - Mikhail Borodin , Kaushik De , Jose Garcia Navarro 2015

The physics goals of the next Large Hadron Collider run include high precision tests of the Standard Model and searches for new physics. These goals require detailed comparison of data with computational models simulating the expected data behavior. To highlight the role which modeling and simulation plays in future scientific discovery, we report on use cases and experience with a unified system built to process both real and simulated data of growing volume and variety.

Distributed Parallel and Cluster Computing High Energy Physics - Experiment Instrumentation and Detectors

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

205 - Samiya Khan , Xiufeng Liu , Syed Arshad Ali 2019

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed.

Distributed Parallel and Cluster Computing Databases

Probabilistic Skyline Query Processing over Uncertain Data Streams in Edge Computing Environments

88 - Chuan-Chi Lai , Chuan-Ming Liu , Yan-Lin Chen 2020

With the advancement of technology, the data generated in our lives is getting faster and faster, and the amount of data that various applications need to process becomes extremely huge. Therefore, we need to put more effort into analyzing data and extracting valuable information. Cloud computing used to be a good technology to solve a large number of data analysis problems. However, in the era of the popularity of the Internet of Things (IoT), transmitting sensing data back to the cloud for centralized data analysis will consume a lot of wireless communication and network transmission costs. To solve the above problems, edge computing has become a promising solution. In this paper, we propose a new algorithm for processing probabilistic skyline queries over uncertain data streams in an edge computing environment. We use the concept of a second skyline set to filter data that is unlikely to be the result of the skyline. Besides, the edge server only sends the information needed to update the global analysis results on the cloud server, which will greatly reduce the amount of data transmitted over the network. The results show that our proposed method not only reduces the response time by more than 50% compared with the brute force method on two-dimensional data but also maintains the leading processing speed on high-dimensional data.

Distributed Parallel and Cluster Computing Databases Data Structures and Algorithms

comments

Fetching comments

Information Technology Institute ITI

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Advancements in Big Data Processing in the ATLAS and CMS Experiments

Ask ChatGPT about the research

No Arabic abstract

Read More