أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Alireza S. Mahani

Combining matching and linear regression: Introducing a mathematical framework and software for simulations, diagnostics and calibration

248 - Alireza S. Mahani , Mansour T.A. Sharabiani 2015

Combining matching and regression for causal inference provides double-robustness in removing treatment effect estimation bias due to confounding variables. In most real-world applications, however, treatment and control populations are not large eno ugh for matching to achieve perfect or near-perfect balance on all confounding variables and their nonlinear/interaction functions, leading to trade-offs. [this fact is independent of regression, so a bit disjointed from first sentence.] Furthermore, variance is as important of a contributor as bias towards total error in small samples, and must therefore be factored into the methodological decisions. In this paper, we develop a mathematical framework for quantifying the combined impact of matching and linear regression on bias and variance of treatment effect estimation. The framework includes expressions for bias and variance in a misspecified linear regression, theorems regarding impact of matching on bias and variance, and a constrained bias estimation approach for quantifying misspecification bias and combining it with variance to arrive at total error. Methodological decisions can thus be based on minimization of this total error, given the practitioners assumption/belief about an intuitive parameter, which we call `omitted R-squared. The proposed methodology excludes the outcome variable from analysis, thereby avoiding overfit creep and making it suitable for observational study designs. All core functions for bias and variance calculation, as well as diagnostic tools for bias-variance trade-off analysis, matching calibration, and power analysis are made available to researchers and practitioners through an open-source R library, MatchLinReg.

المنهجية

Stochastic Newton Sampler: R Package sns

70 - Alireza S. Mahani , Asad Hasan , Marshall Jiang 2015

The R package sns implements Stochastic Newton Sampler (SNS), a Metropolis-Hastings Monte Carlo Markov Chain algorithm where the proposal density function is a multivariate Gaussian based on a local, second-order Taylor series expansion of log-densit y. The mean of the proposal function is the full Newton step in Newton-Raphson optimization algorithm. Taking advantage of the local, multivariate geometry captured in log-density Hessian allows SNS to be more efficient than univariate samplers, approaching independent sampling as the density function increasingly resembles a multivariate Gaussian. SNS requires the log-density Hessian to be negative-definite everywhere in order to construct a valid proposal function. This property holds, or can be easily checked, for many GLM-like models. When initial point is far from density peak, running SNS in non-stochastic mode by taking the Newton step, augmented with with line search, allows the MCMC chain to converge to high-density areas faster. For high-dimensional problems, partitioning of state space into lower-dimensional subsets, and applying SNS to the subsets within a Gibbs sampling framework can significantly improve the mixing of SNS chains. In addition to the above strategies for improving convergence and mixing, sns offers diagnostics and visualization capabilities, as well as a function for sample-based calculation of Bayesian predictive posterior distributions.

حساب المنهجية

Expander Framework for Generating High-Dimensional GLM Gradient and Hessian from Low-Dimensional Base Distributions: R Package RegressionFactory

186 - Alireza S. Mahani , Mansour T.A. Sharabiani 2015

The R package RegressionFactory provides expander functions for constructing the high-dimensional gradient vector and Hessian matrix of the log-likelihood function for generalized linear models (GLMs), from the lower-dimensional base-distribution der ivatives. The software follows a modular implementation using the chain rule of derivatives. Such modularity offers a clear separation of case-specific components (base distribution functional form and link functions) from common steps (e.g., matrix algebra operations needed for expansion) in calculating log-likelihood derivatives. In doing so, RegressionFactory offers several advantages: 1) It provides a fast and convenient method for constructing log-likelihood and its derivatives by requiring only the low-dimensional, base-distribution derivatives, 2) The accompanying definiteness-invariance theorem allows researchers to reason about the negative-definiteness of the log-likelihood Hessian in the much lower-dimensional space of the base distributions, 3) The factorized, abstract view of regression suggests opportunities to generate novel regression models, and 4) Computational techniques for performance optimization can be developed generically in the abstract framework and be readily applicable across all the specific regression instances. We expect RegressionFactory to facilitate research and development on optimization and sampling techniques for GLM log-likelihoods as well as construction of composite models from GLM lego blocks, such as Hierarchical Bayesian models.

حساب

Multivariate-from-Univariate MCMC Sampler: R Package MfUSampler

81 - Alireza S. Mahani , Mansour T.A. Sharabiani 2014

The R package MfUSampler provides Monte Carlo Markov Chain machinery for generating samples from multivariate probability distributions using univariate sampling algorithms such as Slice Sampler and Adaptive Rejection Sampler. The sampler function pe rforms a full cycle of univariate sampling steps, one coordinate at a time. In each step, the latest sample values obtained for other coordinates are used to form the conditional distributions. The concept is an extension of Gibbs sampling where each step involves, not an independent sample from the conditional distribution, but a Markov transition for which the conditional distribution is invariant. The software relies on proportionality of conditional distributions to the joint distribution to implement a thin wrapper for producing conditionals. Examples illustrate basic usage as well as methods for improving performance. By encapsulating the multivariate-from-univariate logic, MfUSampler provides a reliable library for rapid prototyping of custom Bayesian models while allowing for incremental performance optimizations such as utilization of conjugacy, conditional independence, and porting function evaluations to compiled languages.

حساب

Efficient SIMD RNG for Varying-Parameter Streams: C++ Class BatchRNG

202 - Alireza S. Mahani , Mansour T.A. Sharabiani 2014

Single-Instruction, Multiple-Data (SIMD) random number generators (RNGs) take advantage of vector units to offer significant performance gain over non-vectorized libraries, but they often rely on batch production of deviates from distributions with f ixed parameters. In many statistical applications such as Gibbs sampling, parameters of sampled distributions change from one iteration to the next, requiring that random deviates be generated one-at-a-time. This situation can render vectorized RNGs inefficient, and even inferior to their scalar counterparts. The C++ class BatchRNG uses buffers of base distributions such uniform, Gaussian and exponential to take advantage of vector units while allowing for sequences of deviates to be generated with varying parameters. These small buffers are consumed and replenished as needed during a program execution. Performance tests using Intel Vector Statistical Library (VSL) on various probability distributions illustrates the effectiveness of the proposed batching strategy.

حساب البرمجيات الرياضية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد