Efficient Construction of Nonlinear Models over Normalized Data

49 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiaohui Yu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhaoyue Chen - Nick Koudas - Zhe Zhang

التعلم الآلي قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Machine Learning (ML) applications are proliferating in the enterprise. Relational data which are prevalent in enterprise applications are typically normalized; as a result, data has to be denormalized via primary/foreign-key joins to be provided as input to ML algorithms. In this paper, we study the implementation of popular nonlinear ML models, Gaussian Mixture Models (GMM) and Neural Networks (NN), over normalized data addressing both cases of binary and multi-way joins over normalized relations. For the case of GMM, we show how it is possible to decompose computation in a systematic way both for binary joins and for multi-way joins to construct mixture models. We demonstrate that by factoring the computation, one can conduct the training of the models much faster compared to other applicable approaches, without any loss in accuracy. For the case of NN, we propose algorithms to train the network taking normalized data as the input. Similarly, we present algorithms that can conduct the training of the network in a factorized way and offer performance advantages. The redundancy introduced by denormalization can be exploited for certain types of activation functions. However, we demonstrate that attempting to explore this redundancy is helpful up to a certain point; exploring redundancy at higher layers of the network will always result in increased costs and is not recommended. We present the results of a thorough experimental evaluation, varying several parameters of the input relations involved and demonstrate that our proposals for the training of GMM and NN yield drastic performance improvements typically starting at 100%, which become increasingly higher as parameters of the underlying data vary, without any loss in accuracy.

قيم البحث

99 - Baharan Mirzasoleiman , Jeff Bilmes , Jure Leskovec 2019

Incremental gradient (IG) methods, such as stochastic gradient descent and its variants are commonly used for large scale optimization in machine learning. Despite the sustained effort to make IG methods more data-efficient, it remains an open questi on how to select a training data subset that can theoretically and practically perform on par with the full dataset. Here we develop CRAIG, a method to select a weighted subset (or coreset) of training data that closely estimates the full gradient by maximizing a submodular function. We prove that applying IG to this subset is guaranteed to converge to the (near)optimal solution with the same convergence rate as that of IG for convex optimization. As a result, CRAIG achieves a speedup that is inversely proportional to the size of the subset. To our knowledge, this is the first rigorous method for data-efficient training of general machine learning models. Our extensive set of experiments show that CRAIG, while achieving practically the same solution, speeds up various IG methods by up to 6x for logistic regression and 3x for training deep neural networks.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Quinoa: a Q-function You Infer Normalized Over Actions

55 - Jonas Degrave , Abbas Abdolmaleki , Jost Tobias Springenberg 2019

We present an algorithm for learning an approximate action-value soft Q-function in the relative entropy regularised reinforcement learning setting, for which an optimal improved policy can be recovered in closed form. We use recent advances in norma lising flows for parametrising the policy together with a learned value-function; and show how this combination can be used to implicitly represent Q-values of an arbitrary policy in continuous action space. Using simple temporal difference learning on the Q-values then leads to a unified objective for policy and value learning. We show how this approach considerably simplifies standard Actor-Critic off-policy algorithms, removing the need for a policy optimisation step. We perform experiments on a range of established reinforcement learning benchmarks, demonstrating that our approach allows for complex, multimodal policy distributions in continuous action spaces, while keeping the process of sampling from the policy both fast and exact.

التعلم الآلي الحوسبة العصبية والتطورية

Clock synchronization over networks using sawtooth models

66 - Pol del Aguila Pla , Lissy Pellaco , Satyam Dwivedi 2019

Clock synchronization and ranging over a wireless network with low communication overhead is a challenging goal with tremendous impact. In this paper, we study the use of time-to-digital converters in wireless sensors, which provides clock synchroniz ation and ranging at negligible communication overhead through a sawtooth signal model for round trip times between two nodes. In particular, we derive Cram{e}r-Rao lower bounds for a linearitzation of the sawtooth signal model, and we thoroughly evaluate simple estimation techniques by simulation, giving clear and concise performance references for this technology.

معالجة الإشارات قواعد البيانات أنظمة وتحكم

Interpretable Dynamics Models for Data-Efficient Reinforcement Learning

373 - Markus Kaiser , Clemens Otte , Thomas Runkler 2019

In this paper, we present a Bayesian view on model-based reinforcement learning. We use expert knowledge to impose structure on the transition model and present an efficient learning scheme based on variational inference. This scheme is applied to a heteroskedastic and bimodal benchmark problem on which we compare our results to NFQ and show how our approach yields human-interpretable insight about the underlying dynamics while also increasing data-efficiency.

التعلم الآلي التعلم الالي

Equi-Joins Over Encrypted Data for Series of Queries

83 - Masoumeh Shafieinejad 2021

Encryption provides a method to protect data outsourced to a DBMS provider, e.g., in the cloud. However, performing database operations over encrypted data requires specialized encryption schemes that carefully balance security and performance. In th is paper, we present a new encryption scheme that can efficiently perform equi-joins over encrypted data with better security than the state-of-the-art. In particular, our encryption scheme reduces the leakage to equality of rows that match a selection criterion and only reveals the transitive closure of the sum of the leakages of each query in a series of queries. Our encryption scheme is provable secure. We implemented our encryption scheme and evaluated it over a dataset from the TPC-H benchmark.

التشفير والأمن قواعد البيانات