بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

DataSlicer: Task-Based Data Selection for Visual Data Exploration

55 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rada Chirkova

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Farid Alborzi - Surajit Chaudhuri - Rada Chirkova

قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In visual exploration and analysis of data, determining how to select and transform the data for visualization is a challenge for data-unfamiliar or inexperienced users. Our main hypothesis is that for many data sets and common analysis tasks, there are relatively few data slices that result in effective visualizations. By focusing human users on appropriate and suitably transformed parts of the underlying data sets, these data slices can help the users carry their task to correct completion. To verify this hypothesis, we develop a framework that permits us to capture exemplary data slices for a user task, and to explore and parse visual-exploration sequences into a format that makes them distinct and easy to compare. We develop a recommendation system, DataSlicer, that matches a currently viewed data slice with the most promising next effective data slices for the given exploration task. We report the results of controlled experiments with an implementation of the DataSlicer system, using four common analytical task types. The experiments demonstrate statistically significant improvements in accuracy and exploration speed versus users without access to our system.

قيم البحث

523 - Kamel Aouiche , Jer^ome Darmont 2007

Materialized views and indexes are physical structures for accelerating data access that are casually used in data warehouses. However, these data structures generate some maintenance overhead. They also share the same storage space. Most existing st udies about materialized view and index selection consider these structures separately. In this paper, we adopt the opposite stance and couple materialized view and index selection to take view-index interactions into account and achieve efficient storage space sharing. Candidate materialized views and indexes are selected through a data mining process. We also exploit cost models that evaluate the respective benefit of indexing and view materialization, and help select a relevant configuration of indexes and materialized views among the candidates. Experimental results show that our strategy performs better than an independent selection of materialized views and indexes.

قواعد البيانات

Task-agnostic Indexes for Deep Learning-based Queries over Unstructured Data

104 - Daniel Kang , John Guibas , Peter Bailis 2020

Unstructured data is now commonly queried by using target deep neural networks (DNNs) to produce structured information, e.g., object types and positions in video. As these target DNNs can be computationally expensive, recent work uses proxy models t o produce query-specific proxy scores. These proxy scores are then used in downstream query processing algorithms for improved query execution speeds. Unfortunately, proxy models are often trained per-query, require large amounts of training data from the target DNN, and new training methods per query type. In this work, we develop an index construction method (task-agnostic semantic trainable index, TASTI) that produces reusable embeddings that can be used to generate proxy scores for a wide range of queries, removing the need for query-specific proxies. We observe that many queries over the same dataset only require access to the schema induced by the target DNN. For example, an aggregation query counting the number of cars and a selection query selecting frames of cars require only the object types per frame of video. To leverage this opportunity, TASTI produces embeddings per record that have the key property that close embeddings have similar extracted attributes under the induced schema. Given this property, we show that clustering by embeddings can be used to answer downstream queries efficiently. We theoretically analyze TASTI and show that low training error guarantees downstream query accuracy for a natural class of queries. We evaluate TASTI on four video and text datasets, and three query types. We show that TASTI can be 10x less expensive to construct than proxy models and can outperform them by up to 24x at query time.

قواعد البيانات

Data Source Selection for Information Integration in Big Data Era

56 - Yiming Lin , Hongzhi Wang , Jianzhong Li 2016

In Big data era, information integration often requires abundant data extracted from massive data sources. Due to a large number of data sources, data source selection plays a crucial role in information integration, since it is costly and even impos sible to access all data sources. Data Source selection should consider both efficiency and effectiveness issues. For efficiency, the approach should achieve high performance and be scalability to fit large data source amount. From effectiveness aspect, data quality and overlapping of sources are to be considered, since data quality varies much from data sources, with significant differences in the accuracy and coverage of the data provided, and the overlapping of sources can even lower the quality of data integrated from selected data sources. In this paper, we study source selection problem in textit{Big Data Era} and propose methods which can scale to datasets with up to millions of data sources and guarantee the quality of results. Motivated by this, we propose a new object function taking the expected number of true values a source can provide as a criteria to evaluate the contribution of a data source. Based on our proposed index we present a scalable algorithm and two pruning strategies to improve the efficiency without sacrificing precision. Experimental results on both real world and synthetic data sets show that our methods can select sources providing a large proportion of true values efficiently and can scale to massive data sources.

قواعد البيانات

Visualizing Outliers in High Dimensional Functional Data for Task fMRI data exploration

220 - Yasser Aleman-Gomez , Manuel Desco (3 2021

Task-based functional magnetic resonance imaging (task fMRI) is a non-invasive technique that allows identifying brain regions whose activity changes when individuals are asked to perform a given task. This contributes to the understanding of how the human brain is organized in functionally distinct subdivisions. Task fMRI experiments from high-resolution scans provide hundred of thousands of longitudinal signals for each individual, corresponding to measurements of brain activity over each voxel of the brain along the duration of the experiment. In this context, we propose some visualization techniques for high dimensional functional data relying on depth-based notions that allow for computationally efficient 2-dim representations of tfMRI data and that shed light on sample composition, outlier presence and individual variability. We believe that this step is crucial previously to any inferential approach willing to identify neuroscientific patterns across individuals, tasks and brain regions. We illustrate the proposed technique through a simulation study and demonstrate its application on a motor and language task fMRI experiment.

المنهجية تطبيقات الإحصاء

Dynamic index selection in data warehouses

634 - Stephane Azefack 2008

Analytical queries defined on data warehouses are complex and use several join operations that are very costly, especially when run on very large data volumes. To improve response times, data warehouse administrators casually use indexing techniques. This task is nevertheless complex and fastidious. In this paper, we present an automatic, dynamic index selection method for data warehouses that is based on incremental frequent itemset mining from a given query workload. The main advantage of this approach is that it helps update the set of selected indexes when workload evolves instead of recreating it from scratch. Preliminary experimental results illustrate the efficiency of this approach, both in terms of performance enhancement and overhead.

قواعد البيانات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الإسلامية في لبنان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

DataSlicer: Task-Based Data Selection for Visual Data Exploration

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً