Expander Framework for Generating High-Dimensional GLM Gradient and Hessian from Low-Dimensional Base Distributions: R Package RegressionFactory

247 0 0.0 ( 0 )

Download Cite

Added by Alireza Mahani

Publication date 2015

fields Mathematical Statistics

and research's language is English

Authors Alireza S. Mahani - Mansour T.A. Sharabiani

Computation

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The R package RegressionFactory provides expander functions for constructing the high-dimensional gradient vector and Hessian matrix of the log-likelihood function for generalized linear models (GLMs), from the lower-dimensional base-distribution derivatives. The software follows a modular implementation using the chain rule of derivatives. Such modularity offers a clear separation of case-specific components (base distribution functional form and link functions) from common steps (e.g., matrix algebra operations needed for expansion) in calculating log-likelihood derivatives. In doing so, RegressionFactory offers several advantages: 1) It provides a fast and convenient method for constructing log-likelihood and its derivatives by requiring only the low-dimensional, base-distribution derivatives, 2) The accompanying definiteness-invariance theorem allows researchers to reason about the negative-definiteness of the log-likelihood Hessian in the much lower-dimensional space of the base distributions, 3) The factorized, abstract view of regression suggests opportunities to generate novel regression models, and 4) Computational techniques for performance optimization can be developed generically in the abstract framework and be readily applicable across all the specific regression instances. We expect RegressionFactory to facilitate research and development on optimization and sampling techniques for GLM log-likelihoods as well as construction of composite models from GLM lego blocks, such as Hierarchical Bayesian models.

rate research

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

72 - Brian L. Trippe , Jonathan H. Huggins , Raj Agrawal 2019

Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational-statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.

Computation Machine Learning Methodology

SIHR: An R Package for Statistical Inference in High-dimensional Linear and Logistic Regression Models

102 - Prabrisha Rakshit , T. Tony Cai , Zijian Guo 2021

We introduce and illustrate through numerical examples the R package texttt{SIHR} which handles the statistical inference for (1) linear and quadratic functionals in the high-dimensional linear regression and (2) linear functional in the high-dimensional logistic regression. The focus of the proposed algorithms is on the point estimation, confidence interval construction and hypothesis testing. The inference methods are extended to multiple regression models. We include real data applications to demonstrate the packages performance and practicality.

Computation Methodology Other Statistics

sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo

121 - Jack Baker , Paul Fearnhead , Emily B. Fox 2017

This paper introduces the R package sgmcmc; which can be used for Bayesian inference on problems with large datasets using stochastic gradient Markov chain Monte Carlo (SGMCMC). Traditional Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings, are known to run prohibitively slowly as the dataset size increases. SGMCMC solves this issue by only using a subset of data at each iteration. SGMCMC requires calculating gradients of the log likelihood and log priors, which can be time consuming and error prone to perform by hand. The sgmcmc package calculates these gradients itself using automatic differentiation, making the implementation of these methods much easier. To do this, the package uses the software library TensorFlow, which has a variety of statistical distributions and mathematical operations as standard, meaning a wide class of models can be built using this framework. SGMCMC has become widely adopted in the machine learning literature, but less so in the statistics community. We believe this may be partly due to lack of software; this package aims to bridge this gap.

Computation Applications Machine Learning

Multivariate-from-Univariate MCMC Sampler: R Package MfUSampler

222 - Alireza S. Mahani , Mansour T.A. Sharabiani 2014

The R package MfUSampler provides Monte Carlo Markov Chain machinery for generating samples from multivariate probability distributions using univariate sampling algorithms such as Slice Sampler and Adaptive Rejection Sampler. The sampler function performs a full cycle of univariate sampling steps, one coordinate at a time. In each step, the latest sample values obtained for other coordinates are used to form the conditional distributions. The concept is an extension of Gibbs sampling where each step involves, not an independent sample from the conditional distribution, but a Markov transition for which the conditional distribution is invariant. The software relies on proportionality of conditional distributions to the joint distribution to implement a thin wrapper for producing conditionals. Examples illustrate basic usage as well as methods for improving performance. By encapsulating the multivariate-from-univariate logic, MfUSampler provides a reliable library for rapid prototyping of custom Bayesian models while allowing for incremental performance optimizations such as utilization of conjugacy, conditional independence, and porting function evaluations to compiled languages.

Computation

ProcData: An R Package for Process Data Analysis

167 - Xueying Tang , Susu Zhang , Zhi Wang 2020

Process data refer to data recorded in the log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents response processes of solving the items. Process data analysis aims at enhancing educational assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for processing, describing, and analyzing process data. We define an S3 class proc for organizing process data and extend generic methods summary and print for class proc. Two feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for fitting and making predictions from a neural-network-based sequence model. These functions call relevant functions in package keras for constructing and training neural networks. In addition, several response process generators and a real dataset of response processes of the climate control item in the 2012 Programme for International Student Assessment are included in the package.

Computation Machine Learning