Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Giving Text Analytics a Boost

63 0 0.0 ( 0 )

Download Cite

Added by Raphael Polig

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Raphael Polig - Kubilay Atasu - Laura Chiticariu

Distributed Parallel and Cluster Computing Information Retrieval

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The amount of textual data has reached a new scale and continues to grow at an unprecedented rate. IBMs SystemT software is a powerful text analytics system, which offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing the so-called Big Data in an efficient way, despite the high memory bandwidth that is available. We show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemTs information extraction queries can be improved by an order of magnitude. We present how such a system can be deployed by extending SystemTs existing compilation flow and by using a multi-threaded communication interface that can efficiently use the bandwidth of the accelerator.

rate research

Cortex: Harnessing Correlations to Boost Query Performance

169 - Vikram Nathan , Jialin Ding , Tim Kraska 2020

Databases employ indexes to filter out irrelevant records, which reduces scan overhead and speeds up query execution. However, this optimization is only available to queries that filter on the indexed attribute. To extend these speedups to queries on other attributes, database systems have turned to secondary and multi-dimensional indexes. Unfortunately, these approaches are restrictive: secondary indexes have a large memory footprint and can only speed up queries that access a small number of records, and multi-dimensional indexes cannot scale to more than a handful of columns. We present Cortex, an approach that takes advantage of correlations to extend the reach of primary indexes to more attributes. Unlike prior work, Cortex can adapt itself to any existing primary index, whether single or multi-dimensional, to harness a broad variety of correlations, such as those that exist between more than two attributes or have a large number of outliers. We demonstrate that on real datasets exhibiting these diverse types of correlations, Cortex matches or outperforms traditional secondary indexes with $5times$ less space, and it is $2-8times$ faster than existing approaches to indexing correlations.

Databases Information Retrieval

Whiz: A Fast and Flexible Data Analytics System

88 - Robert Grandl , Arjun Singhvi , Raajay Viswanathan 2017

Todays data analytics frameworks are compute-centric, with analytics execution almost entirely dependent on the pre-determined physical structure of the high-level computation. Relegating intermediate data to a second class entity in this manner hurts flexibility, performance, and efficiency. We present Whiz, a new analytics framework that cleanly separates computation from intermediate data. It enables runtime visibility into data via programmable monitoring, and data-driven computation (where intermediate data values drive when/what computation runs) via an event abstraction. Experiments with a Whiz prototype on a large cluster using batch, streaming, and graph analytics workloads show that its performance is 1.3-2x better than state-of-the-art.

Distributed Parallel and Cluster Computing

Adaptive Base Class Boost for Multi-class Classification

554 - Ping Li 2008

We develop the concept of ABC-Boost (Adaptive Base Class Boost) for multi-class classification and present ABC-MART, a concrete implementation of ABC-Boost. The original MART (Multiple Additive Regression Trees) algorithm has been very successful in large-scale applications. For binary classification, ABC-MART recovers MART. For multi-class classification, ABC-MART considerably improves MART, as evaluated on several public data sets.

Machine Learning Information Retrieval

High Bandwidth Memory on FPGAs: A Data Analytics Perspective

85 - Kaan Kara , Christoph Hagleitner , Dionysios Diamantopoulos 2020

FPGA-based data processing in datacenters is increasing in popularity due to the demands of modern workloads and the ensuing necessity for specialization in hardware. Driven by this trend, vendors are rapidly adapting reconfigurable devices to suit data and compute intensive workloads. Inclusion of High Bandwidth Memory (HBM) in FPGA devices is a recent example. HBM promises overcoming the bandwidth bottleneck, faced often by FPGA-based accelerators due to their throughput oriented design. In this paper, we study the usage and benefits of HBM on FPGAs from a data analytics perspective. We consider three workloads that are often performed in analytics oriented databases and implement them on FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. In certain cases, FPGA+HBM based solutions are able to surpass the highest performance provided by either a 2-socket POWER9 system or a 14-core XeonE5 by up to 1.8x (selection), 12.9x (join), and 3.2x (SGD).

Distributed Parallel and Cluster Computing Hardware Architecture

A Visual Analytics Framework for Reviewing Streaming Performance Data

99 - Suraj P. Kesavan , Takanori Fujiwara , Jianping Kelvin Li 2020

Understanding and tuning the performance of extreme-scale parallel computing systems demands a streaming approach due to the computational cost of applying offline algorithms to vast amounts of performance log data. Analyzing large streaming data is challenging because the rate of receiving data and limited time to comprehend data make it difficult for the analysts to sufficiently examine the data without missing important changes or patterns. To support streaming data analysis, we introduce a visual analytic framework comprising of three modules: data management, analysis, and interactive visualization. The data management module collects various computing and communication performance metrics from the monitored system using streaming data processing techniques and feeds the data to the other two modules. The analysis module automatically identifies important changes and patterns at the required latency. In particular, we introduce a set of online and progressive analysis methods for not only controlling the computational costs but also helping analysts better follow the critical aspects of the analysis results. Finally, the interactive visualization module provides the analysts with a coherent view of the changes and patterns in the continuously captured performance data. Through a multi-faceted case study on performance analysis of parallel discrete-event simulation, we demonstrate the effectiveness of our framework for identifying bottlenecks and locating outliers.

Distributed Parallel and Cluster Computing Human-Computer Interaction Machine Learning

comments

Fetching comments

Al-Etihad University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Giving Text Analytics a Boost

Ask ChatGPT about the research

No Arabic abstract

Read More