Subscribe to the gold package and get unlimited access to Shamra Academy

Continuous Prefetch for Interactive Data Applications

73 0 0.0 ( 0 )

Download Cite

Added by Haneen Mohammed

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Haneen Mohammed - Ziyun Wei - Eugene Wu

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Interactive data visualization and exploration (DVE) applications are often network-bottlenecked due to bursty request patterns, large response sizes, and heterogeneous deployments over a range of networks and devices. This makes it difficult to ensure consistently low response times (< 100ms). Khameleon is a framework for DVE applications that uses a novel combination of prefetching and response tuning to dynamically trade-off response quality for low latency. Khameleon exploits DVEs approximation tolerance: immediate lower-quality responses are preferable to waiting for complete results. To this end, Khameleon progressively encodes responses, and runs a server-side scheduler that proactively streams portions of responses using available bandwidth to maximize users perceived interactivity. The scheduler involves a complex optimization based on available resources, predicted user interactions, and response quality levels; yet, decisions must also be real-time. To overcome this, Khameleon uses a fast greedy approximation which closely mimics the optimal approach. Using image exploration and visualization applications with real user interaction traces, we show that across a wide range of network and client resource conditions, Khameleon outperforms classic prefetching approaches that benefit from perfect prediction models: response latencies with Khameleon are never higher, and typically between 2 to 3 orders of magnitude lower while response quality remains within 50%-80%.

rate research

Probabilistic Data with Continuous Distributions

91 - Martin Grohe , Benjamin Lucien Kaminski , Joost-Pieter Katoen andn Peter Lindner 2021

Statistical models of real world data typically involve continuous probability distributions such as normal, Laplace, or exponential distributions. Such distributions are supported by many probabilistic modelling formalisms, including probabilistic database systems. Yet, the traditional theoretical framework of probabilistic databases focusses entirely on finite probabilistic databases. Only recently, we set out to develop the mathematical theory of infinite probabilistic databases. The present paper is an exposition of two recent papers which are cornerstones of this theory. In (Grohe, Lindner; ICDT 2020) we propose a very general framework for probabilistic databases, possibly involving continuous probability distributions, and show that queries have a well-defined semantics in this framework. In (Grohe, Kaminski, Katoen, Lindner; PODS 2020) we extend the declarative probabilistic programming language Generative Datalog, proposed by (Barany et al.~2017) for discrete probability distributions, to continuous probability distributions and show that such programs yield generative models of continuous probabilistic databases.

Databases

Industrial Big Data Analytics: Challenges, Methodologies, and Applications

72 - JunPing Wang , WenSheng Zhang , YouKang Shi 2018

While manufacturers have been generating highly distributed data from various systems, devices and applications, a number of challenges in both data management and data analysis require new approaches to support the big data era. These challenges for industrial big data analytics is real-time analysis and decision-making from massive heterogeneous data sources in manufacturing space. This survey presents new concepts, methodologies, and applications scenarios of industrial big data analytics, which can provide dramatic improvements in velocity and veracity problem solving. We focus on five important methodologies of industrial big data analytics: 1) Highly distributed industrial data ingestion: access and integrate to highly distributed data sources from various systems, devices and applications; 2) Industrial big data repository: cope with sampling biases and heterogeneity, and store different data formats and structures; 3) Large-scale industrial data management: organizes massive heterogeneous data and share large-scale data; 4) Industrial data analytics: track data provenance, from data generation through data preparation; 5) Industrial data governance: ensures data trust, integrity and security. For each phase, we introduce to current research in industries and academia, and discusses challenges and potential solutions. We also examine the typical applications of industrial big data, including smart factory visibility, machine fleet, energy management, proactive maintenance, and just in time supply chain. These discussions aim to understand the value of industrial big data. Lastly, this survey is concluded with a discussion of open problems and future directions.

Databases

Continuous Queries for Multi-Relational Graphs

401 - Sutanay Choudhury , Lawrence B. Holder , Abhik Ray 2012

Acting on time-critical events by processing ever growing social media or news streams is a major technical challenge. Many of these data sources can be modeled as multi-relational graphs. Continuous queries or techniques to search for rare events that typically arise in monitoring applications have been studied extensively for relational databases. This work is dedicated to answer the question that emerges naturally: how can we efficiently execute a continuous query on a dynamic graph? This paper presents an exact subgraph search algorithm that exploits the temporal characteristics of representative queries for online news or social media monitoring. The algorithm is based on a novel data structure called the Subgraph Join Tree (SJ-Tree) that leverages the structural and semantic characteristics of the underlying multi-relational graph. The paper concludes with extensive experimentation on several real-world datasets that demonstrates the validity of this approach.

Databases Social and Information Networks

Non-Interactive Differential Privacy: a Survey

516 - David Leoni 2012

OpenData movement around the globe is demanding more access to information which lies locked in public or private servers. As recently reported by a McKinsey publication, this data has significant economic value, yet its release has potential to blatantly conflict with people privacy. Recent UK government inquires have shown concern from various parties about publication of anonymized databases, as there is concrete possibility of user identification by means of linkage attacks. Differential privacy stands out as a model that provides strong formal guarantees about the anonymity of the participants in a sanitized database. Only recent results demonstrated its applicability on real-life datasets, though. This paper covers such breakthrough discoveries, by reviewing applications of differential privacy for non-interactive publication of anonymized real-life datasets. Theory, utility and a data-aware comparison are discussed on a variety of principles and concrete applications.

Databases

Interactive query expansion for professional search applications

345 - Tony Russell-Rose , Philip Gooch , Udo Kruschwitz 2021

Knowledge workers (such as healthcare information professionals, patent agents and recruitment professionals) undertake work tasks where search forms a core part of their duties. In these instances, the search task is often complex and time-consuming and requires specialist expert knowledge to formulate accurate search strategies. Interactive features such as query expansion can play a key role in supporting these tasks. However, generating query suggestions within a professional search context requires that consideration be given to the specialist, structured nature of the search strategies they employ. In this paper, we investigate a variety of query expansion methods applied to a collection of Boolean search strategies used in a variety of real-world professional search tasks. The results demonstrate the utility of context-free distributional language models and the value of using linguistic cues such as ngram order to optimise the balance between precision and recall.

Information Retrieval Human-Computer Interaction

comments

Fetching comments

Helwan

Additional details More universities

Continuous Prefetch for Interactive Data Applications

Ask ChatGPT about the research

No Arabic abstract

Read More