Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Explaining Black Boxes on Sequential Data using Weighted Automata

63 0 0.0 ( 0 )

Download Cite

Added by R\\'emi Eyraud

Publication date 2018

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Stephane Ayache - Remi Eyraud - Noe Goudian

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Understanding how a learned black box works is of crucial interest for the future of Machine Learning. In this paper, we pioneer the question of the global interpretability of learned black box models that assign numerical values to symbolic sequential data. To tackle that task, we propose a spectral algorithm for the extraction of weighted automata (WA) from such black boxes. This algorithm does not require the access to a dataset or to the inner representation of the black box: the inferred model can be obtained solely by querying the black box, feeding it with inputs and analyzing its outputs. Experiments using Recurrent Neural Networks (RNN) trained on a wide collection of 48 synthetic datasets and 2 real datasets show that the obtained approximation is of great quality.

rate research

Synthetic Data Generators: Sequential and Private

72 - Olivier Bousquet , Roi Livni , Shay Moran 2019

We study the sample complexity of private synthetic data generation over an unbounded sized class of statistical queries, and show that any class that is privately proper PAC learnable admits a private synthetic data generator (perhaps non-efficient). Previous work on synthetic data generators focused on the case that the query class $mathcal{D}$ is finite and obtained sample complexity bounds that scale logarithmically with the size $|mathcal{D}|$. Here we construct a private synthetic data generator whose sample complexity is independent of the domain size, and we replace finiteness with the assumption that $mathcal{D}$ is privately PAC learnable (a formally weaker task, hence we obtain equivalence between the two tasks).

Machine Learning Machine Learning

On Detecting Data Pollution Attacks On Recommender Systems Using Sequential GANs

87 - Behzad Shahrasbi , Venugopal Mani , Apoorv Reddy Arrabothu 2020

Recommender systems are an essential part of any e-commerce platform. Recommendations are typically generated by aggregating large amounts of user data. A malicious actor may be motivated to sway the output of such recommender systems by injecting malicious datapoints to leverage the system for financial gain. In this work, we propose a semi-supervised attack detection algorithm to identify the malicious datapoints. We do this by leveraging a portion of the dataset that has a lower chance of being polluted to learn the distribution of genuine datapoints. Our proposed approach modifies the Generative Adversarial Network architecture to take into account the contextual information from user activity. This allows the model to distinguish legitimate datapoints from the injected ones.

Machine Learning

Small Boxes Big Data: A Deep Learning Approach to Optimize Variable Sized Bin Packing

77 - Feng Mao , Edgar Blanco , Mingang Fu 2017

Bin Packing problems have been widely studied because of their broad applications in different domains. Known as a set of NP-hard problems, they have different vari- ations and many heuristics have been proposed for obtaining approximate solutions. Specifically, for the 1D variable sized bin packing problem, the two key sets of optimization heuristics are the bin assignment and the bin allocation. Usually the performance of a single static optimization heuristic can not beat that of a dynamic one which is tailored for each bin packing instance. Building such an adaptive system requires modeling the relationship between bin features and packing perform profiles. The primary drawbacks of traditional AI machine learnings for this task are the natural limitations of feature engineering, such as the curse of dimensionality and feature selection quality. We introduce a deep learning approach to overcome the drawbacks by applying a large training data set, auto feature selection and fast, accurate labeling. We show in this paper how to build such a system by both theoretical formulation and engineering practices. Our prediction system achieves up to 89% training accuracy and 72% validation accuracy to select the best heuristic that can generate a better quality bin packing solution.

Machine Learning Machine Learning

Distillation of Weighted Automata from Recurrent Neural Networks using a Spectral Approach

184 - Remi Eyraud , Stephane Ayache 2020

This paper is an attempt to bridge the gap between deep learning and grammatical inference. Indeed, it provides an algorithm to extract a (stochastic) formal language from any recurrent neural network trained for language modelling. In detail, the algorithm uses the already trained network as an oracle -- and thus does not require the access to the inner representation of the black-box -- and applies a spectral approach to infer a weighted automaton. As weighted automata compute linear functions, they are computationally more efficient than neural networks and thus the nature of the approach is the one of knowledge distillation. We detail experiments on 62 data sets (both synthetic and from real-world applications) that allow an in-depth study of the abilities of the proposed algorithm. The results show the WA we extract are good approximations of the RNN, validating the approach. Moreover, we show how the process provides interesting insights toward the behavior of RNN learned on data, enlarging the scope of this work to the one of explainability of deep learning models.

Machine Learning Formal Languages and Automata Theory Machine Learning

Weighted boxes fusion: Ensembling boxes from different object detection models

119 - Roman Solovyev , Weimin Wang , Tatiana Gabruseva 2019

In this work, we present a novel method for combining predictions of object detection models: weighted boxes fusion. Our algorithm utilizes confidence scores of all proposed bounding boxes to constructs the averaged boxes. We tested method on several datasets and evaluated it in the context of the Open Images and COCO Object Detection tracks, achieving top results in these challenges. The source code is publicly available at https://github.com/ZFTurbo/Weighted-Boxes-Fusion

Computer Vision and Pattern Recognition

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Explaining Black Boxes on Sequential Data using Weighted Automata

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions