Do you want to publish a course? Click here

JSRT: James-Stein Regression Tree

283   0   0.0 ( 0 )
 Added by Xingchun Xiang
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Regression tree (RT) has been widely used in machine learning and data mining community. Given a target data for prediction, a regression tree is first constructed based on a training dataset before making prediction for each leaf node. In practice, the performance of RT relies heavily on the local mean of samples from an individual node during the tree construction/prediction stage, while neglecting the global information from different nodes, which also plays an important role. To address this issue, we propose a novel regression tree, named James-Stein Regression Tree (JSRT) by considering global information from different nodes. Specifically, we incorporate the global mean information based on James-Stein estimator from different nodes during the construction/predicton stage. Besides, we analyze the generalization error of our method under the mean square error (MSE) metric. Extensive experiments on public benchmark datasets verify the effectiveness and efficiency of our method, and demonstrate the superiority of our method over other RT prediction methods.

rate research

Read More

Ensembles of deep neural networks have achieved great success recently, but they do not offer a proper Bayesian justification. Moreover, while they allow for averaging of predictions over several hypotheses, they do not provide any guarantees for their diversity, leading to redundant solutions in function space. In contrast, particle-based inference methods, such as Stein variational gradient descent (SVGD), offer a Bayesian framework, but rely on the choice of a kernel to measure the similarity between ensemble members. In this work, we study different SVGD methods operating in the weight space, function space, and in a hybrid setting. We compare the SVGD approaches to other ensembling-based methods in terms of their theoretical properties and assess their empirical performance on synthetic and real-world tasks. We find that SVGD using functional and hybrid kernels can overcome the limitations of deep ensembles. It improves on functional diversity and uncertainty estimation and approaches the true Bayesian posterior more closely. Moreover, we show that using stochastic SVGD updates, as opposed to the standard deterministic ones, can further improve the performance.
Stein variational gradient descent (SVGD) and its variants have shown promising successes in approximate inference for complex distributions. However, their empirical performance depends crucially on the choice of optimal kernel. Unfortunately, RBF kernel with median heuristics is a common choice in previous approaches which has been proved sub-optimal. Inspired by the paradigm of multiple kernel learning, our solution to this issue is using a combination of multiple kernels to approximate the optimal kernel instead of a single one which may limit the performance and flexibility. To do so, we extend Kernelized Stein Discrepancy (KSD) to its multiple kernel view called Multiple Kernelized Stein Discrepancy (MKSD). Further, we leverage MKSD to construct a general algorithm based on SVGD, which be called Multiple Kernel SVGD (MK-SVGD). Besides, we automatically assign a weight to each kernel without any other parameters. The proposed method not only gets rid of optimal kernel dependence but also maintains computational effectiveness. Experiments on various tasks and models show the effectiveness of our method.
To measure the similarity of documents, the Wasserstein distance is a powerful tool, but it requires a high computational cost. Recently, for fast computation of the Wasserstein distance, methods for approximating the Wasserstein distance using a tree metric have been proposed. These tree-based methods allow fast comparisons of a large number of documents; however, they are unsupervised and do not learn task-specific distances. In this work, we propose the Supervised Tree-Wasserstein (STW) distance, a fast, supervised metric learning method based on the tree metric. Specifically, we rewrite the Wasserstein distance on the tree metric by the parent-child relationships of a tree and formulate it as a continuous optimization problem using a contrastive loss. Experimentally, we show that the STW distance can be computed fast, and improves the accuracy of document classification tasks. Furthermore, the STW distance is formulated by matrix multiplications, runs on a GPU, and is suitable for batch processing. Therefore, we show that the STW distance is extremely efficient when comparing a large number of documents.
We consider the online version of the isotonic regression problem. Given a set of linearly ordered points (e.g., on the real line), the learner must predict labels sequentially at adversarially chosen positions and is evaluated by her total squared loss compared against the best isotonic (non-decreasing) function in hindsight. We survey several standard online learning algorithms and show that none of them achieve the optimal regret exponent; in fact, most of them (including Online Gradient Descent, Follow the Leader and Exponential Weights) incur linear regret. We then prove that the Exponential Weights algorithm played over a covering net of isotonic functions has a regret bounded by $Obig(T^{1/3} log^{2/3}(T)big)$ and present a matching $Omega(T^{1/3})$ lower bound on regret. We provide a computationally efficient version of this algorithm. We also analyze the noise-free case, in which the revealed labels are isotonic, and show that the bound can be improved to $O(log T)$ or even to $O(1)$ (when the labels are revealed in isotonic order). Finally, we extend the analysis beyond squared loss and give bounds for entropic loss and absolute loss.
Decisions are increasingly taken by both humans and machine learning models. However, machine learning models are currently trained for full automation -- they are not aware that some of the decisions may still be taken by humans. In this paper, we take a first step towards the development of machine learning models that are optimized to operate under different automation levels. More specifically, we first introduce the problem of ridge regression under human assistance and show that it is NP-hard. Then, we derive an alternative representation of the corresponding objective function as a difference of nondecreasing submodular functions. Building on this representation, we further show that the objective is nondecreasing and satisfies $alpha$-submodularity, a recently introduced notion of approximate submodularity. These properties allow a simple and efficient greedy algorithm to enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from two important applications -- medical diagnosis and content moderation-demonstrate that our algorithm outsources to humans those samples in which the prediction error of the ridge regression model would have been the highest if it had to make a prediction, it outperforms several competitive baselines, and its performance is robust with respect to several design choices and hyperparameters used in the experiments.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا