بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Towards a Workload for Evolutionary Analytics

373 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jagan Sankaranarayanan

تاريخ النشر 2013

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jeff LeFevre - Jagan Sankaranarayanan - Hakan Hacigumus

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية الأداء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Emerging data analysis involves the ingestion and exploration of new data sets, application of complex functions, and frequent query revisions based on observing prior query answers. We call this new type of analysis evolutionary analytics and identify its properties. This type of analysis is not well represented by current benchmark workloads. In this paper, we present a workload and identify several metrics to test system support for evolutionary analytics. Along with our metrics, we present methodologies for running the workload that capture this analytical scenario.

قيم البحث

79 - Jia Zou , Amitabh Das , Pratik Barhate 2020

Persistent partitioning is effective in avoiding expensive shuffling operations. However it remains a significant challenge to automate this process for Big Data analytics workloads that extensively use user defined functions (UDFs), where sub-comput ations are hard to be reused for partitionings compared to relational applications. In addition, functional dependency that is widely utilized for partitioning selection is often unavailable in the unstructured data that is ubiquitous in UDF-centric analytics. We propose the Lachesis system, which represents UDF-centric workloads as workflows of analyzable and reusable sub-computations. Lachesis further adopts a deep reinforcement learning model to infer which sub-computations should be used to partition the underlying data. This analysis is then applied to automatically optimize the storage of the data across applications to improve the performance and users productivity.

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

77 - Gregory M. Essertel , Ruby Y. Tahboub , James M. Decker 2017

The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an i ncrease in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While Spark has made impressive progress, we show that for relational workloads, there is still a significant gap compared with best-of-breed query engines. And when stepping outside of the relational world, query optimization techniques are ineffective if large parts of a computation have to be treated as user-defined functions (UDFs). We present Flare: a new back-end for Spark that brings performance closer to the best SQL engines, without giving up the added expressiveness of Spark. We demonstrate order of magnitude speedups both for relational workloads such as TPC-H, as well as for a range of machine learning kernels that combine relational and iterative functional processing. Flare achieves these results through (1) compilation to native code, (2) replacing parts of the Spark runtime system, and (3) extending the scope of optimization and code generation to large classes of UDFs.

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية الأداء

Towards Semantic Big Graph Analytics for Cross-Domain Knowledge Discovery

94 - Feichen Shen 2019

In recent years, the size of big linked data has grown rapidly and this number is still rising. Big linked data and knowledge bases come from different domains such as life sciences, publications, media, social web, and so on. However, with the rapid increasing of data, it is very challenging for people to acquire a comprehensive collection of cross domain knowledge to meet their needs. Under this circumstance, it is extremely difficult for people without expertise to extract knowledge from various domains. Therefore, nowadays human limited knowledge cant feed the high requirement for discovering large amount of cross domain knowledge. In this research, we present a big graph analytics framework aims at addressing this issue by providing semantic methods to facilitate the management of big graph data from close domains in order to discover cross domain knowledge in a more accurate and efficient way.

قواعد البيانات

Towards Million-Server Network Simulations on Just a Laptop

77 - Maciej Besta , Marcel Schneider , Salvatore Di Girolamo 2021

The growing size of data center and HPC networks pose unprecedented requirements on the scalability of simulation infrastructure. The ability to simulate such large-scale interconnects on a simple PC would facilitate research efforts. Unfortunately, as we first show in this work, existing shared-memory packet-level simulators do not scale to the sizes of the largest networks considered today. We then illustrate a feasibility analysis and a set of enhancements that enable a simple packet-level htsim simulator to scale to the unprecedented simulation sizes on a single PC. Our code is available online and can be used to design novel schemes in the coming era of omnipresent data centers and HPC clusters.

بنية الشبكات والإنترنت النظم الموزعة والتوازية والحوسبة العنقودية الأداء

A workload-adaptive mechanism for linear queries under local differential privacy

91 - Ryan McKenna , Raj Kumar Maity , Arya Mazumdar 2020

We propose a new mechanism to accurately answer a user-provided set of linear counting queries under local differential privacy (LDP). Given a set of linear counting queries (the workload) our mechanism automatically adapts to provide accuracy on the workload queries. We define a parametric class of mechanisms that produce unbiased estimates of the workload, and formulate a constrained optimization problem to select a mechanism from this class that minimizes expected total squared error. We solve this optimization problem numerically using projected gradient descent and provide an efficient implementation that scales to large workloads. We demonstrate the effectiveness of our optimization-based approach in a wide variety of settings, showing that it outperforms many competitors, even outperforming existing mechanisms on the workloads for which they were intended.

قواعد البيانات التشفير والأمن

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة دمشق

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Towards a Workload for Evolutionary Analytics

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً