Subscribe to the gold package and get unlimited access to Shamra Academy

Probabilistic Structured Predictors

403 0 0.0 ( 0 )

Download Cite

Added by Shankar Vembu

Publication date 2012

fields Informatics Engineering

and research's language is English

Authors Shankar Vembu - Thomas Gartner - Mario Boley

Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider MAP estimators for structured prediction with exponential family models. In particular, we concentrate on the case that efficient algorithms for uniform sampling from the output space exist. We show that under this assumption (i) exact computation of the partition function remains a hard problem, and (ii) the partition function and the gradient of the log partition function can be approximated efficiently. Our main result is an approximation scheme for the partition function based on Markov Chain Monte Carlo theory. We also show that the efficient uniform sampling assumption holds in several application settings that are of importance in machine learning.

rate research

Strudel: Learning Structured-Decomposable Probabilistic Circuits

151 - Meihua Dang , Antonio Vergari , Guy Van den Broeck 2020

Probabilistic circuits (PCs) represent a probability distribution as a computational graph. Enforcing structural properties on these graphs guarantees that several inference scenarios become tractable. Among these properties, structured decomposability is a particularly appealing one: it enables the efficient and exact computations of the probability of complex logical formulas, and can be used to reason about the expected output of certain predictive models under missing data. This paper proposes Strudel, a simple, fast and accurate learning algorithm for structured-decomposable PCs. Compared to prior work for learning structured-decomposable PCs, Strudel delivers more accurate single PC models in fewer iterations, and dramatically scales learning when building ensembles of PCs. It achieves this scalability by exploiting another structural property of PCs, called determinism, and by sharing the same computational graph across mixture components. We show these advantages on standard density estimation benchmarks and challenging inference scenarios.

Machine Learning Artificial Intelligence

Structured Probabilistic Pruning for Convolutional Neural Network Acceleration

136 - Huan Wang , Qiming Zhang , Yuehai Wang 2017

In this paper, we propose a novel progressive parameter pruning method for Convolutional Neural Network acceleration, named Structured Probabilistic Pruning (SPP), which effectively prunes weights of convolutional layers in a probabilistic manner. Unlike existing deterministic pruning approaches, where unimportant weights are permanently eliminated, SPP introduces a pruning probability for each weight, and pruning is guided by sampling from the pruning probabilities. A mechanism is designed to increase and decrease pruning probabilities based on importance criteria in the training process. Experiments show that, with 4x speedup, SPP can accelerate AlexNet with only 0.3% loss of top-5 accuracy and VGG-16 with 0.8% loss of top-5 accuracy in ImageNet classification. Moreover, SPP can be directly applied to accelerate multi-branch CNN networks, such as ResNet, without specific adaptations. Our 2x speedup ResNet-50 only suffers 0.8% loss of top-5 accuracy on ImageNet. We further show the effectiveness of SPP on transfer learning tasks.

Machine Learning Machine Learning

Training conformal predictors

208 - Nicolo Colombo , Vladimir Vovk 2020

Efficiency criteria for conformal prediction, such as emph{observed fuzziness} (i.e., the sum of p-values associated with false labels), are commonly used to emph{evaluate} the performance of given conformal predictors. Here, we investigate whether it is possible to exploit efficiency criteria to emph{learn} classifiers, both conformal predictors and point classifiers, by using such criteria as training objective functions. The proposed idea is implemented for the problem of binary classification of hand-written digits. By choosing a 1-dimensional model class (with one real-valued free parameter), we can solve the optimization problems through an (approximate) exhaustive search over (a discrete version of) the parameter space. Our empirical results suggest that conformal predictors trained by minimizing their observed fuzziness perform better than conformal predictors trained in the traditional way by minimizing the emph{prediction error} of the corresponding point classifier. They also have a reasonable performance in terms of their prediction error on the test set.

Machine Learning Machine Learning

Structured Synthesis for Probabilistic Systems

58 - Nils Jansen , Laura Humphrey , Jana Tumova 2018

We introduce the concept of structured synthesis for Markov decision processes where the structure is induced from finitely many pre-specified options for a system configuration. The resulting synthesis problem is in general a nonlinear programming problem (NLP) with integer variables. As solving NLPs is in general not feasible, we present an alternative approach. We present a transformation of models specified in the {PRISM} probabilistic programming language to models that account for all possible system configurations by means of nondeterministic choices. Together with a control module that ensures consistent configurations throughout the system, this transformation enables the use of optimized tools for model checking in a black-box fashion. While this transformation increases the size of a model, experiments with standard benchmarks show that the method provides a feasible approach for structured synthesis. Moreover, we demonstrate the usefulness along a realistic case study involving surveillance by unmanned aerial vehicles in a shipping facility.

Software Engineering

Stronger NAS with Weaker Predictors

83 - Junru Wu , Xiyang Dai , Dongdong Chen 2021

Neural Architecture Search (NAS) often trains and evaluates a large number of architectures. Recent predictor-based NAS approaches attempt to address such heavy computation costs with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. Given limited samples, these predictors, however, are far from accurate to locate top architectures due to the difficulty of fitting the huge search space. This paper reflects on a simple yet crucial question: if our final goal is to find the best architecture, do we really need to model the whole space well?. We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors. As a key property of the proposed weak predictors, their probabilities of sampling better architectures keep increasing. Hence we only sample a few well-performed architectures guided by the previously learned predictor and estimate a new better weak predictor. This embarrassingly easy framework produces coarse-to-fine iteration to refine the ranking of sampling space gradually. Extensive experiments demonstrate that our method costs fewer samples to find top-performance architectures on NAS-Bench-101 and NAS-Bench-201, as well as achieves the state-of-the-art ImageNet performance on the NASNet search space. In particular, compared to state-of-the-art (SOTA) predictor-based NAS methods, WeakNAS outperforms all of them with notable margins, e.g., requiring at least 7.5x less samples to find global optimal on NAS-Bench-101; and WeakNAS can also absorb them for further performance boost. We further strike the new SOTA result of 81.3% in the ImageNet MobileNet Search Space. The code is available at https://github.com/VITA-Group/WeakNAS.

Machine Learning Computer Vision and Pattern Recognition Machine Learning

Probabilistic Structured Predictors

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions