ﻻ يوجد ملخص باللغة العربية
In recent years, there has been a substantial amount of work on large-scale data analytics using Hadoop-based platforms running on large clusters of commodity machines. A less-explored topic is how those data, dominated by application logs, are collected and structured to begin with. In this paper, we present Twitters production logging infrastructure and its evolution from application-specific logging to a unified client events log format, where messages are captured in common, well-formatted, flexible Thrift messages. Since most analytics tasks consider the user session as the basic unit of analysis, we pre-materialize session sequences, which are compact summaries that can answer a large class of common queries quickly. The development of this infrastructure has streamlined log collection and data analysis, thereby improving our ability to rapidly experiment and iterate on various aspects of the service.
Next Generation Sequencing (NGS) technology has resulted in massive amounts of proteomics and genomics data. This data is of no use if it is not properly analyzed. ETL (Extraction, Transformation, Loading) is an important step in designing data analy
The ASTRI (Astrofisica con Specchi a Tecnologia Replicante Italiana) Mini-Array (MA) project is an international collaboration led by the Italian National Institute for Astrophysics (INAF). ASTRI MA is composed of nine Cherenkov telescopes operating
With new emerging technologies, such as satellites and drones, archaeologists collect data over large areas. However, it becomes difficult to process such data in time. Archaeological data also have many different formats (images, texts, sensor data)
A new type of logs, the command log, is being employed to replace the traditional data log (e.g., ARIES log) in the in-memory databases. Instead of recording how the tuples are updated, a command log only tracks the transactions being executed, there
Smart meters are increasingly used worldwide. Smart meters are the advanced meters capable of measuring energy consumption at a fine-grained time interval, e.g., every 15 minutes. Smart meter data are typically bundled with social economic data in an