A simple Bayesian approach to nonparametric regression is described using fuzzy sets and membership functions. Membership functions are interpreted as likelihood functions for the unknown regression function, so that with the help of a reference prior they can be transformed to prior density functions. The unknown regression function is decomposed into wavelets and a hierarchical Bayesian approach is employed for making inferences on the resulting wavelet coefficients.
In many applications there is interest in estimating the relation between a predictor and an outcome when the relation is known to be monotone or otherwise constrained due to the physical processes involved. We consider one such application--inferring time-resolved aerosol concentration from a low-cost differential pressure sensor. The objective is to estimate a monotone function and make inference on the scaled first derivative of the function. We proposed Bayesian nonparametric monotone regression which uses a Bernstein polynomial basis to construct the regression function and puts a Dirichlet process prior on the regression coefficients. The base measure of the Dirichlet process is a finite mixture of a mass point at zero and a truncated normal. This construction imposes monotonicity while clustering the basis functions. Clustering the basis functions reduces the parameter space and allows the estimated regression function to be linear. With the proposed approach we can make closed-formed inference on the derivative of the estimated function including full quantification of uncertainty. In a simulation study the proposed method performs similar to other monotone regression approaches when the true function is wavy but performs better when the true function is linear. We apply the method to estimate time-resolved aerosol concentration with a newly-developed portable aerosol monitor. The R package bnmr is made available to implement the method.
We propose a novel broadcasting idea to model the nonlinearity in tensor regression non-parametrically. Unlike existing non-parametric tensor regression models, the resulting model strikes a good balance between flexibility and interpretability. A penalized estimation and corresponding algorithm are proposed. Our theoretical investigation, which allows the dimensions of the tensor covariate to diverge, indicates that the proposed estimation enjoys a desirable convergence rate. We also provide a minimax lower bound, which characterizes the optimality of the proposed estimator in a wide range of scenarios. Numerical experiments are conducted to confirm the theoretical finding and show that the proposed model has advantages over existing linear counterparts.
The simultaneous estimation of many parameters $eta_i$, based on a corresponding set of observations $x_i$, for $i=1,ldots, n$, is a key research problem that has received renewed attention in the high-dimensional setting. %The classic example involves estimating a vector of normal means $mu_i$ subject to a fixed variance term $sigma^2$. However, Many practical situations involve heterogeneous data $(x_i, theta_i)$ where $theta_i$ is a known nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the Nonparametric Empirical Bayes Smoothing Tweedie (NEST) estimator, which efficiently estimates $eta_i$ and properly adjusts for heterogeneity %by approximating the marginal density of the data $f_{theta_i}(x_i)$ and applying this density to via a generalized version of Tweedies formula. NEST is capable of handling a wider range of settings than previously proposed heterogeneous approaches as it does not make any parametric assumptions on the prior distribution of $eta_i$. The estimation framework is simple but general enough to accommodate any member of the exponential family of distributions. %; a thorough study of the normal means problem subject to heterogeneous variances is presented to illustrate the proposed framework. Our theoretical results show that NEST is asymptotically optimal, while simulation studies show that it outperforms competing methods, with substantial efficiency gains in many settings. The method is demonstrated on a data set measuring the performance gap in math scores between socioeconomically advantaged and disadvantaged students in K-12 schools.
Wavelet shrinkage estimators are widely applied in several fields of science for denoising data in wavelet domain by reducing the magnitudes of empirical coefficients. In nonparametric regression problem, most of the shrinkage rules are derived from models composed by an unknown function with additive gaussian noise. Although gaussian noise assumption is reasonable in several real data analysis, mainly for large sample sizes, it is not general. Contaminated data with positive noise can occur in practice and nonparametric regression models with positive noise bring challenges in wavelet shrinkage point of view. This work develops bayesian shrinkage rules to estimate wavelet coefficients from a nonparametric regression framework with additive and strictly positive noise under exponential and lognormal distributions. Computational aspects are discussed and simulation studies to analyse the performances of the proposed shrinkage rules and compare them with standard techniques are done. An application in winning times Boston Marathon dataset is also provided.
One of the most popular methodologies for estimating the average treatment effect at the threshold in a regression discontinuity design is local linear regression (LLR), which places larger weight on units closer to the threshold. We propose a Gaussian process regression methodology that acts as a Bayesian analog to LLR for regression discontinuity designs. Our methodology provides a flexible fit for treatment and control responses by placing a general prior on the mean response functions. Furthermore, unlike LLR, our methodology can incorporate uncertainty in how units are weighted when estimating the treatment effect. We prove our method is consistent in estimating the average treatment effect at the threshold. Furthermore, we find via simulation that our method exhibits promising coverage, interval length, and mean squared error properties compared to standard LLR and state-of-the-art LLR methodologies. Finally, we explore the performance of our method on a real-world example by studying the impact of being a first-round draft pick on the performance and playing time of basketball players in the National Basketball Association.