No Arabic abstract
In the field of multi-objective optimization algorithms, multi-objective Bayesian Global Optimization (MOBGO) is an important branch, in addition to evolutionary multi-objective optimization algorithms (EMOAs). MOBGO utilizes Gaussian Process models learned from previous objective function evaluations to decide the next evaluation site by maximizing or minimizing an infill criterion. A common criterion in MOBGO is the Expected Hypervolume Improvement (EHVI), which shows a good performance on a wide range of problems, with respect to exploration and exploitation. However, so far it has been a challenge to calculate exact EHVI values efficiently. In this paper, an efficient algorithm for the computation of the exact EHVI for a generic case is proposed. This efficient algorithm is based on partitioning the integration volume into a set of axis-parallel slices. Theoretically, the upper bound time complexities are improved from previously $O (n^2)$ and $O(n^3)$, for two- and three-objective problems respectively, to $Theta(nlog n)$, which is asymptotically optimal. This article generalizes the scheme in higher dimensional case by utilizing a new hyperbox decomposition technique, which was proposed by D{a}chert et al, EJOR, 2017. It also utilizes a generalization of the multilayered integration scheme that scales linearly in the number of hyperboxes of the decomposition. The speed comparison shows that the proposed algorithm in this paper significantly reduces computation time. Finally, this decomposition technique is applied in the calculation of the Probability of Improvement (PoI).
Optimizing multiple competing black-box objectives is a challenging problem in many fields, including science, engineering, and machine learning. Multi-objective Bayesian optimization is a powerful approach for identifying the optimal trade-offs between the objectives with very few function evaluations. However, existing methods tend to perform poorly when observations are corrupted by noise, as they do not take into account uncertainty in the true Pareto frontier over the previously evaluated designs. We propose a novel acquisition function, NEHVI, that overcomes this important practical limitation by applying a Bayesian treatment to the popular expected hypervolume improvement criterion to integrate over this uncertainty in the Pareto frontier. We further argue that, even in the noiseless setting, the problem of generating multiple candidates in parallel reduces that of handling uncertainty in the Pareto frontier. Through this lens, we derive a natural parallel variant of NEHVI that can efficiently generate large batches of candidates. We provide a theoretical convergence guarantee for optimizing a Monte Carlo estimator of NEHVI using exact sample-path gradients. Empirically, we show that NEHVI achieves state-of-the-art performance in noisy and large-batch environments.
In many real-world scenarios, decision makers seek to efficiently optimize multiple competing objectives in a sample-efficient fashion. Multi-objective Bayesian optimization (BO) is a common approach, but many of the best-performing acquisition functions do not have known analytic gradients and suffer from high computational overhead. We leverage recent advances in programming models and hardware acceleration for multi-objective BO using Expected Hypervolume Improvement (EHVI)---an algorithm notorious for its high computational complexity. We derive a novel formulation of q-Expected Hypervolume Improvement (qEHVI), an acquisition function that extends EHVI to the parallel, constrained evaluation setting. qEHVI is an exact computation of the joint EHVI of q new candidate points (up to Monte-Carlo (MC) integration error). Whereas previous EHVI formulations rely on gradient-free acquisition optimization or approximated gradients, we compute exact gradients of the MC estimator via auto-differentiation, thereby enabling efficient and effective optimization using first-order and quasi-second-order methods. Our empirical evaluation demonstrates that qEHVI is computationally tractable in many practical scenarios and outperforms state-of-the-art multi-objective BO algorithms at a fraction of their wall time.
Bayesian method is capable of capturing real world uncertainties/incompleteness and properly addressing the over-fitting issue faced by deep neural networks. In recent years, Bayesian Neural Networks (BNNs) have drawn tremendous attentions of AI researchers and proved to be successful in many applications. However, the required high computation complexity makes BNNs difficult to be deployed in computing systems with limited power budget. In this paper, an efficient BNN inference flow is proposed to reduce the computation cost then is evaluated by means of both software and hardware implementations. A feature decomposition and memorization (texttt{DM}) strategy is utilized to reform the BNN inference flow in a reduced manner. About half of the computations could be eliminated compared to the traditional approach that has been proved by theoretical analysis and software validations. Subsequently, in order to resolve the hardware resource limitations, a memory-friendly computing framework is further deployed to reduce the memory overhead introduced by texttt{DM} strategy. Finally, we implement our approach in Verilog and synthesise it with 45 $nm$ FreePDK technology. Hardware simulation results on multi-layer BNNs demonstrate that, when compared with the traditional BNN inference method, it provides an energy consumption reduction of 73% and a 4$times$ speedup at the expense of 14% area overhead.
In recent years, deep neural networks (DNN) have become a highly active area of research, and shown remarkable achievements on a variety of computer vision tasks. DNNs, however, are known to often make overconfident yet incorrect predictions on out-of-distribution samples, which can be a major obstacle to real-world deployments because the training dataset is always limited compared to diverse real-world samples. Thus, it is fundamental to provide guarantees of robustness to the distribution shift between training and test time when we construct DNN models in practice. Moreover, in many cases, the deep learning models are deployed as black boxes and the performance has been already optimized for a training dataset, thus changing the black box itself can lead to performance degradation. We here study the robustness to the geometric transformations in a specific condition where the black-box image classifier is given. We propose an additional learner, emph{REinforcement Spatial Transform learner (REST)}, that transforms the warped input data into samples regarded as in-distribution by the black-box models. Our work aims to improve the robustness by adding a REST module in front of any black boxes and training only the REST module without retraining the original black box model in an end-to-end manner, i.e. we try to convert the real-world data into training distribution which the performance of the black-box model is best suited for. We use a confidence score that is obtained from the black-box model to determine whether the transformed input is drawn from in-distribution. We empirically show that our method has an advantage in generalization to geometric transformations and sample efficiency.
Computing the expectation of kernel functions is a ubiquitous task in machine learning, with applications from classical support vector machines to exploiting kernel embeddings of distributions in probabilistic modeling, statistical inference, causal discovery, and deep learning. In all these scenarios, we tend to resort to Monte Carlo estimates as expectations of kernels are intractable in general. In this work, we characterize the conditions under which we can compute expected kernels exactly and efficiently, by leveraging recent advances in probabilistic circuit representations. We first construct a circuit representation for kernels and propose an approach to such tractable computation. We then demonstrate possible advancements for kernel embedding frameworks by exploiting tractable expected kernels to derive new algorithms for two challenging scenarios: 1) reasoning under missing data with kernel support vector regressors; 2) devising a collapsed black-box importance sampling scheme. Finally, we empirically evaluate both algorithms and show that they outperform standard baselines on a variety of datasets.