New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Data-Free/Data-Sparse Softmax Parameter Estimation with Structured Class Geometries

104 0 0.0 ( 0 )

Download Cite

Added by Nisar Ahmed

Publication date 2018

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Nisar Ahmed

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This note considers softmax parameter estimation when little/no labeled training data is available, but a priori information about the relative geometry of class label log-odds boundaries is available. It is shown that `data-free softmax model synthesis corresponds to solving a linear system of parameter equations, wherein desired dominant class log-odds boundaries are encoded via convex polytopes that decompose the input feature space. When solvable, the linear equations yield closed-form softmax parameter solution families using class boundary polytope specifications only. This allows softmax parameter learning to be implemented without expensive brute force data sampling and numerical optimization. The linear equations can also be adapted to constrained maximum likelihood estimation in data-sparse settings. Since solutions may also fail to exist for the linear parameter equations derived from certain polytope specifications, it is thus also shown that there exist probabilistic classification problems over m convexly separable classes for which the log-odds boundaries cannot be learned using an m-class softmax model.

rate research

Sparse Travel Time Estimation from Streaming Data

222 - Saif Eddin Jabari , Nikolaos M. Freris , 2018

We address two shortcomings in online travel time estimation methods for congested urban traffic. The first shortcoming is related to the determination of the number of mixture modes, which can change dynamically, within day and from day to day. The second shortcoming is the wide-spread use of Gaussian probability densities as mixture components. Gaussian densities fail to capture the positive skew in travel time distributions and, consequently, large numbers of mixture components are needed for reasonable fitting accuracy when applied as mixture components. They also assign positive probabilities to negative travel times. To address these issues, this paper derives a mixture distribution with Gamma component densities, which are asymmetric and supported on the positive numbers. We use sparse estimation techniques to ensure parsimonious models and propose a generalization of Gamma mixture densities using Mittag-Leffler functions, which provides enhanced fitting flexibility and improved parsimony. In order to accommodate within-day variability and allow for online implementation of the proposed methodology (i.e., fast computations on streaming travel time data), we introduce a recursive algorithm which efficiently updates the fitted distribution whenever new data become available. Experimental results using real-world travel time data illustrate the efficacy of the proposed methods.

Machine Learning Machine Learning

Gradient Estimation with Stochastic Softmax Tricks

253 - Max B. Paulus , Dami Choi , Daniel Tarlow 2020

The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators are easy to implement and low variance, but the goal of scaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalize the Gumbel-Softmax trick to combinatorial spaces. Our framework is a unified perspective on existing relaxed estimators for perturbation models, and it contains many novel relaxations. We design structured relaxations for subset selection, spanning trees, arborescences, and others. When compared to less structured baselines, we find that stochastic softmax tricks can be used to train latent variable models that perform better and discover more latent structure.

Machine Learning Machine Learning

Sparse Linear Regression With Missing Data

299 - Ravi Ganti , Rebecca M. Willett 2015

This paper proposes a fast and accurate method for sparse regression in the presence of missing data. The underlying statistical model encapsulates the low-dimensional structure of the incomplete data matrix and the sparsity of the regression coefficients, and the proposed algorithm jointly learns the low-dimensional structure of the data and a linear regressor with sparse coefficients. The proposed stochastic optimization method, Sparse Linear Regression with Missing Data (SLRM), performs an alternating minimization procedure and scales well with the problem size. Large deviation inequalities shed light on the impact of the various problem-dependent parameters on the expected squared loss of the learned regressor. Extensive simulations on both synthetic and real datasets show that SLRM performs better than competing algorithms in a variety of contexts.

Machine Learning Machine Learning Methodology

Unlabeled Data Improves Adversarial Robustness

176 - Yair Carmon , Aditi Raghunathan , Ludwig Schmidt 2019

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images sourced from 80 Million Tiny Images and use robust self-training to outperform state-of-the-art robust accuracies by over 5 points in (i) $ell_infty$ robustness against several strong attacks via adversarial training and (ii) certified $ell_2$ and $ell_infty$ robustness via randomized smoothing. On SVHN, adding the datasets own extra training set with the labels removed provides gains of 4 to 10 points, within 1 point of the gain from using the extra labels.

Machine Learning Computer Vision and Pattern Recognition Machine Learning

MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation

83 - Sanjay Kariyappa , Atul Prakash , Moinuddin Qureshi 2020

Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities. This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack. Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Data-Free/Data-Sparse Softmax Parameter Estimation with Structured Class Geometries

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions