How can AI Automate End-to-End Data Science?

119 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Djallel Bouneffouf

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Charu Aggarwal - Djallel Bouneffouf - Horst Samulowitz

الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and skills. To make data science more accessible and scalable, we need its democratization. Automated Data Science (AutoDS) is aimed towards that goal and is emerging as an important research and business topic. We introduce and define the AutoDS challenge, followed by a proposal of a general AutoDS framework that covers existing approaches but also provides guidance for the development of new methods. We categorize and review the existing literature from multiple aspects of the problem setup and employed techniques. Then we provide several views on how AI could succeed in automating end-to-end AutoDS. We hope this survey can serve as insightful guideline for the AutoDS field and provide inspiration for future research.

قيم البحث

382 - Marco Dinarelli , Nikita Kapoor , Bassam Jabaian 2020

End-to-end architectures have been recently proposed for spoken language understanding (SLU) and semantic parsing. Based on a large amount of data, those models learn jointly acoustic and linguistic-sequential features. Such architectures give very g ood results in the context of domain, intent and slot detection, their application in a more complex semantic chunking and tagging task is less easy. For that, in many cases, models are combined with an external language model to enhance their performance. In this paper we introduce a data efficient system which is trained end-to-end, with no additional, pre-trained external module. One key feature of our approach is an incremental training procedure where acoustic, language and semantic models are trained sequentially one after the other. The proposed model has a reasonable size and achieves competitive results with respect to state-of-the-art while using a small training dataset. In particular, we reach 24.02% Concept Error Rate (CER) on MEDIA/test while training on MEDIA/train without any additional data.

الحساب واللغة التعلم الآلي أنظمة الصوت في الحاسوب

Micro-Data Learning: The Other End of the Spectrum

103 - Jean-Baptiste Mouret 2016

Many fields are now snowed under with an avalanche of data, which raises considerable challenges for computer scientists. Meanwhile, robotics (among other fields) can often only use a few dozen data points because acquiring them involves a process th at is expensive or time-consuming. How can an algorithm learn with only a few data points?

الذكاء الاصطناعي التعلم الآلي علم الروبوتات

End-to-End Entity Classification on Multimodal Knowledge Graphs

198 - W.X. Wilcke 2020

End-to-end multimodal learning on knowledge graphs has been left largely unaddressed. Instead, most end-to-end models such as message passing networks learn solely from the relational information encoded in graphs structure: raw values, or literals, are either omitted completely or are stripped from their values and treated as regular nodes. In either case we lose potentially relevant information which could have otherwise been exploited by our learning methods. To avoid this, we must treat literals and non-literals as separate cases. We must also address each modality separately and accordingly: numbers, texts, images, geometries, et cetera. We propose a multimodal message passing network which not only learns end-to-end from the structure of graphs, but also from their possibly divers set of multimodal node features. Our model uses dedicated (neural) encoders to naturally learn embeddings for node features belonging to five different types of modalities, including images and geometries, which are projected into a joint representation space together with their relational information. We demonstrate our model on a node classification task, and evaluate the effect that each modality has on the overall performance. Our result supports our hypothesis that including information from multiple modalities can help our models obtain a better overall performance.

الذكاء الاصطناعي الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط

Learning Reasoning Strategies in End-to-End Differentiable Proving

159 - Pasquale Minervini , Sebastian Riedel , Pontus Stenetorp 2020

Attempts to render deep learning models interpretable, data-efficient, and robust have seen some success through hybridisation with rule-based systems, for example, in Neural Theorem Provers (NTPs). These neuro-symbolic models can induce interpretabl e rules and learn representations from data via back-propagation, while providing logical explanations for their predictions. However, they are restricted by their computational complexity, as they need to consider all possible proof paths for explaining a goal, thus rendering them unfit for large-scale applications. We present Conditional Theorem Provers (CTPs), an extension to NTPs that learns an optimal rule selection strategy via gradient-based optimisation. We show that CTPs are scalable and yield state-of-the-art results on the CLUTRR dataset, which tests systematic generalisation of neural models by learning to reason over smaller graphs and evaluating on larger ones. Finally, CTPs show better link prediction results on standard benchmarks in comparison with other neural-symbolic models, while being explainable. All source code and datasets are available online, at https://github.com/uclnlp/ctp.

الذكاء الاصطناعي الحساب واللغة التعلم الآلي

An end-to-end data-driven optimisation framework for constrained trajectories

144 - Florent Dewez , Benjamin Guedj , Arthur Talpaert 2020

Many real-world problems require to optimise trajectories under constraints. Classical approaches are based on optimal control methods but require an exact knowledge of the underlying dynamics, which could be challenging or even out of reach. In this paper, we leverage data-driven approaches to design a new end-to-end framework which is dynamics-free for optimised and realistic trajectories. We first decompose the trajectories on function basis, trading the initial infinite dimension problem on a multivariate functional space for a parameter optimisation problem. A maximum emph{a posteriori} approach which incorporates information from data is used to obtain a new optimisation problem which is regularised. The penalised term focuses the search on a region centered on data and includes estimated linear constraints in the problem. We apply our data-driven approach to two settings in aeronautics and sailing routes optimisation, yielding commanding results. The developed approach has been implemented in the Python library PyRotor.

تطبيقات الإحصاء التعلم الآلي التحسين والتحكم