Synthetic Data Generators: Sequential and Private

73 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Roi Livni

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Olivier Bousquet - Roi Livni - Shay Moran

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the sample complexity of private synthetic data generation over an unbounded sized class of statistical queries, and show that any class that is privately proper PAC learnable admits a private synthetic data generator (perhaps non-efficient). Previous work on synthetic data generators focused on the case that the query class $mathcal{D}$ is finite and obtained sample complexity bounds that scale logarithmically with the size $|mathcal{D}|$. Here we construct a private synthetic data generator whose sample complexity is independent of the domain size, and we replace finiteness with the assumption that $mathcal{D}$ is privately PAC learnable (a formally weaker task, hence we obtain equivalence between the two tasks).

قيم البحث

78 - Terrance Liu , Giuseppe Vietri , Zhiwei Steven Wu 2021

We study private synthetic data generation for query release, where the goal is to construct a sanitized version of a sensitive dataset, subject to differential privacy, that approximately preserves the answers to a large collection of statistical qu eries. We first present an algorithmic framework that unifies a long line of iterative algorithms in the literature. Under this framework, we propose two new methods. The first method, private entropy projection (PEP), can be viewed as an advanced variant of MWEM that adaptively reuses past query measurements to boost accuracy. Our second method, generative networks with the exponential mechanism (GEM), circumvents computational bottlenecks in algorithms such as MWEM and PEP by optimizing over generative models parameterized by neural networks, which capture a rich family of distributions while enabling fast gradient-based optimization. We demonstrate that PEP and GEM empirically outperform existing algorithms. Furthermore, we show that GEM nicely incorporates prior information from public data while overcoming limitations of PMW^Pub, the existing state-of-the-art method that also leverages public data.

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

Sequential Interpretability: Methods, Applications, and Future Direction for Understanding Deep Learning Models in the Context of Sequential Data

66 - Benjamin Shickel , Parisa Rashidi 2020

Deep learning continues to revolutionize an ever-growing number of critical application areas including healthcare, transportation, finance, and basic sciences. Despite their increased predictive power, model transparency and human explainability rem ain a significant challenge due to the black box nature of modern deep learning models. In many cases the desired balance between interpretability and performance is predominately task specific. Human-centric domains such as healthcare necessitate a renewed focus on understanding how and why these frameworks are arriving at critical and potentially life-or-death decisions. Given the quantity of research and empirical successes of deep learning for computer vision, most of the existing interpretability research has focused on image processing techniques. Comparatively, less attention has been paid to interpreting deep learning frameworks using sequential data. Given recent deep learning advancements in highly sequential domains such as natural language processing and physiological signal processing, the need for deep sequential explanations is at an all-time high. In this paper, we review current techniques for interpreting deep learning techniques involving sequential data, identify similarities to non-sequential methods, and discuss current limitations and future avenues of sequential interpretability research.

التعلم الآلي التعلم الالي

Explaining Black Boxes on Sequential Data using Weighted Automata

62 - Stephane Ayache , Remi Eyraud , Noe Goudian 2018

Understanding how a learned black box works is of crucial interest for the future of Machine Learning. In this paper, we pioneer the question of the global interpretability of learned black box models that assign numerical values to symbolic sequenti al data. To tackle that task, we propose a spectral algorithm for the extraction of weighted automata (WA) from such black boxes. This algorithm does not require the access to a dataset or to the inner representation of the black box: the inferred model can be obtained solely by querying the black box, feeding it with inputs and analyzing its outputs. Experiments using Recurrent Neural Networks (RNN) trained on a wide collection of 48 synthetic datasets and 2 real datasets show that the obtained approximation is of great quality.

التعلم الآلي التعلم الالي

Interpretable and Differentially Private Predictions

77 - Frederik Harder , Matthias Bauer , Mijung Park 2019

Interpretable predictions, where it is clear why a machine learning model has made a particular decision, can compromise privacy by revealing the characteristics of individual data points. This raises the central question addressed in this paper: Can models be interpretable without compromising privacy? For complex big data fit by correspondingly rich models, balancing privacy and explainability is particularly challenging, such that this question has remained largely unexplored. In this paper, we propose a family of simple models in the aim of approximating complex models using several locally linear maps per class to provide high classification accuracy, as well as differentially private explanations on the classification. We illustrate the usefulness of our approach on several image benchmark datasets as well as a medical dataset.

التعلم الآلي التعلم الالي

PolyGAN: High-Order Polynomial Generators

123 - Grigorios Chrysos , Stylianos Moschoglou , Yannis Panagakis 2019

Generative Adversarial Networks (GANs) have become the gold standard when it comes to learning generative models for high-dimensional distributions. Since their advent, numerous variations of GANs have been introduced in the literature, primarily foc using on utilization of novel loss functions, optimization/regularization strategies and network architectures. In this paper, we turn our attention to the generator and investigate the use of high-order polynomials as an alternative class of universal function approximators. Concretely, we propose PolyGAN, where we model the data generator by means of a high-order polynomial whose unknown parameters are naturally represented by high-order tensors. We introduce two tensor decompositions that significantly reduce the number of parameters and show how they can be efficiently implemented by hierarchical neural networks that only employ linear/convolutional blocks. We exhibit for the first time that by using our approach a GAN generator can approximate the data distribution without using any activation functions. Thorough experimental evaluation on both synthetic and real data (images and 3D point clouds) demonstrates the merits of PolyGAN against the state of the art.

التعلم الآلي التعلم الالي