Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Hokusai - Sketching Streams in Real Time

281 0 0.0 ( 0 )

Download Cite

Added by Sergiy Matusevych

Publication date 2012

fields Informatics Engineering

and research's language is English

Authors Sergiy Matusevych - Alex Smola - Amr Ahmed

Databases Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We describe Hokusai, a real time system which is able to capture frequency information for streams of arbitrary sequences of symbols. The algorithm uses the CountMin sketch as its basis and exploits the fact that sketching is linear. It provides real time statistics of arbitrary events, e.g. streams of queries as a function of time. We use a factorizing approximation to provide point estimates at arbitrary (time, item) combinations. Queries can be answered in constant time.

rate research

Real-Time Anomaly Detection in Edge Streams

313 - Siddharth Bhatia , Rui Liu , Bryan Hooi 2020

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose MIDAS-F, to solve the problem by which anomalies are incorporated into the algorithms internal states, creating a `poisoning effect that can allow future anomalies to slip through undetected. MIDAS-F introduces two modifications: 1) We modify the anomaly scoring function, aiming to reduce the `poisoning effect of newly arriving edges; 2) We introduce a conditional merge step, which updates the algorithms data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the `poisoning effect. Experiments show that MIDAS-F has significantly higher accuracy than MIDAS. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data orders-of-magnitude faster than state-of-the-art approaches; (c) it provides up to 62% higher ROC-AUC than state-of-the-art approaches.

Machine Learning Social and Information Networks Machine Learning

Robust real-time monitoring of high-dimensional data streams

130 - Ruizhi Zhang , Yajun Mei , Jianjun Shi 2019

Robust real-time monitoring of high-dimensional data streams has many important real-world applications such as industrial quality control, signal detection, biosurveillance, but unfortunately it is highly non-trivial to develop efficient schemes due to two challenges: (1) the unknown sparse number or subset of affected data streams and (2) the uncertainty of model specification for high-dimensional data. In this article, motivated by the detection of smaller persistent changes in the presence of larger transient outliers, we develop a family of efficient real-time robust detection schemes for high-dimensional data streams through monitoring feature spaces such as PCA or wavelet coefficients when the feature coefficients are from Tukey-Hubers gross error models with outliers. We propose to construct a new local detection statistic for each feature called $L_{alpha}$-CUSUM statistic that can reduce the effect of outliers by using the Box-Cox transformation of the likelihood function, and then raise a global alarm based upon the sum of the soft-thresholding transformation of these local $L_{alpha}$-CUSUM statistics so that to filter out unaffected features. In addition, we propose a new concept called false alarm breakdown point to measure the robustness of online monitoring schemes, and also characterize the breakdown point of our proposed schemes. Asymptotic analysis, extensive numerical simulations and case study of nonlinear profile monitoring are conducted to illustrate the robustness and usefulness of our proposed schemes.

Methodology Statistics Theory Statistics Theory

Dimensionality reduction of SDPs through sketching

209 - Andreas Bluhm , Daniel Stilck Franca 2017

We show how to sketch semidefinite programs (SDPs) using positive maps in order to reduce their dimension. More precisely, we use Johnsonhyp{}Lindenstrauss transforms to produce a smaller SDP whose solution preserves feasibility or approximates the value of the original problem with high probability. These techniques allow to improve both complexity and storage space requirements. They apply to problems in which the Schatten 1-norm of the matrices specifying the SDP and also of a solution to the problem is constant in the problem size. Furthermore, we provide some results which clarify the limitations of positive, linear sketches in this setting.

Optimization and Control Data Structures and Algorithms

L-Store: A Real-time OLTP and OLAP System

85 - Mohammad Sadoghi , Souvik Bhattacherjee , Bishwaranjan Bhattacharjee 2016

Arguably data is the new natural resource in the enterprise world with an unprecedented degree of proliferation. But to derive real-time actionable insights from the data, it is important to bridge the gap between managing the data that is being updated at a high velocity (i.e., OLTP) and analyzing a large volume of data (i.e., OLAP). However, there has been a divide where specialized solutions were often deployed to support either OLTP or OLAP workloads but not both; thus, limiting the analysis to stale and possibly irrelevant data. In this paper, we present Lineage-based Data Store (L-Store) that combines the real-time processing of transactional and analytical workloads within a single unified engine by introducing a novel lineage-based storage architecture. By exploiting the lineage, we develop a contention-free and lazy staging of columnar data from a write-optimized form (suitable for OLTP) into a read-optimized form (suitable for OLAP) in a transactionally consistent approach that also supports querying and retaining the current and historic data. Our working prototype of L-Store demonstrates its superiority compared to state-of-the-art approaches under a comprehensive experimental evaluation.

Databases

Cloud based Real-Time and Low Latency Scientific Event Analysis

259 - Chen Yang , Xiaofeng Meng , Zhihui Du 2018

Astronomy is well recognized as big data driven science. As the novel observation infrastructures are developed, the sky survey cycles have been shortened from a few days to a few seconds, causing data processing pressure to shift from offline to online. However, existing scientific databases focus on offline analysis of long-term historical data, not real-time and low latency analysis of large-scale newly arriving data. In this paper, a cloud based method is proposed to efficiently analyze scientific events on large-scale newly arriving data. The solution is implemented as a highly efficient system, namely Aserv. A set of compact data store and index structures are proposed to describe the proposed scientific events and a typical analysis pattern is formulized as a set of query operations. Domain aware filter, accuracy aware data partition, highly efficient index and frequently used statistical data designs are four key methods to optimize the performance of Aserv. Experimental results under the typical cloud environment show that the presented optimization mechanism can meet the low latency demand for both large data insertion and scientific event analysis. Aserv can insert 3.5 million rows of data within 3 seconds and perform the heaviest query on 6.7 billion rows of data also within 3 seconds. Furthermore, a performance model is given to help Aserv choose the right cloud resource setup to meet the guaranteed real-time performance requirement.

Databases

comments

Fetching comments

Peninsula Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Hokusai - Sketching Streams in Real Time

Ask ChatGPT about the research

No Arabic abstract

Read More