ترغب بنشر مسار تعليمي؟ اضغط هنا

Building surrogate models is one common approach when we attempt to learn unknown black-box functions. Bayesian optimization provides a framework which allows us to build surrogate models based on sequential samples drawn from the function and find t he optimum. Tuning algorithmic parameters to optimize the performance of large, complicated black-box application codes is a specific important application, which aims at finding the optima of black-box functions. Within the Bayesian optimization framework, the Gaussian process model produces smooth or continuous sample paths. However, the black-box function in the tuning problem is often non-smooth. This difficult tuning problem is worsened by the fact that we usually have limited sequential samples from the black-box function. Motivated by these issues encountered in tuning, we propose a novel additive Gaussian process model called clustered Gaussian process (cGP), where the additive components are induced by clustering. In the examples we studied, the performance can be improved by as much as 90% among repetitive experiments. By using this surrogate model, we want to capture the non-smoothness of the black-box function. In addition to an algorithm for constructing this model, we also apply the model to several artificial and real applications to evaluate it.
Visualizing very large matrices involves many formidable problems. Various popular solutions to these problems involve sampling, clustering, projection, or feature selection to reduce the size and complexity of the original task. An important aspect of these methods is how to preserve relative distances between points in the higher-dimensional space after reducing rows and columns to fit in a lower dimensional space. This aspect is important because conclusions based on faulty visual reasoning can be harmful. Judging dissimilar points as similar or similar points as dissimilar on the basis of a visualization can lead to false conclusions. To ameliorate this bias and to make visualizations of very large datasets feasible, we introduce two new algorithms that respectively select a subset of rows and columns of a rectangular matrix. This selection is designed to preserve relative distances as closely as possible. We compare our matrix sketch to more traditional alternatives on a variety of artificial and real datasets.
Topological data analysis (TDA) allows us to explore the topological features of a dataset. Among topological features, lower dimensional ones have recently drawn the attention of practitioners in mathematics and statistics due to their potential to aid the discovery of low dimensional structure in a data set. However, lower dimensional features are usually challenging to detect from a probabilistic perspective. In this paper, lower dimensional topological features occurring as zero-density regions of density functions are introduced and thoroughly investigated. Specifically, we consider sequences of coverings for the support of a density function in which the coverings are comprised of balls with shrinking radii. We show that, when these coverings satisfy certain sufficient conditions as the sample size goes to infinity, we can detect lower dimensional, zero-density regions with increasingly higher probability while guarding against false detection. We supplement the theoretical developments with the discussion of simulated experiments that elucidate the behavior of the methodology for different choices of the tuning parameters that govern the construction of the covering sequences and characterize the asymptotic results.
Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account the roughness of circular coordinates in change-point and high-dimensional applications. We use a generalized penalty function instead of an $L_{2}$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analysis to support our claim that circular coordinates with generalized penalty will detect the change in high-dimensional datasets under different sampling schemes while preserving the topological structures.
A fundamental problem in computer vision is boundary estimation, where the goal is to delineate the boundary of objects in an image. In this paper, we propose a method which jointly incorporates geometric and topological information within an image t o simultaneously estimate boundaries for objects within images with more complex topologies. We use a topological clustering-based method to assist initialization of the Bayesian active contour model. This combines pixel clustering, boundary smoothness, and potential prior shape information to produce an estimated object boundary. Active contour methods are knownto be extremely sensitive to algorithm initialization, relying on the user to provide a reasonable starting curve to the algorithm. In the presence of images featuring objects with complex topological structures, such as objects with holes or multiple objects, the user must initialize separate curves for each boundary of interest. Our proposed topologically-guided method can provide an interpretable, smart initialization in these settings, freeing up the user from potential pitfalls associated with objects of complex topological structure. We provide a detailed simulation study comparing our initialization to boundary estimates obtained from standard segmentation algorithms. The method is demonstrated on artificial image datasets from computer vision, as well as real-world applications to skin lesion and neural cellular images, for which multiple topological features can be identified.
In this paper we introduce a novel model for Gaussian process (GP) regression in the fully Bayesian setting. Motivated by the ideas of sparsification, localization and Bayesian additive modeling, our model is built around a recursive partitioning (RP ) scheme. Within each RP partition, a sparse GP (SGP) regression model is fitted. A Bayesian additive framework then combines multiple layers of partitioned SGPs, capturing both global trends and local refinements with efficient computations. The model addresses both the problem of efficiency in fitting a full Gaussian process regression model and the problem of prediction performance associated with a single SGP. Our approach mitigates the issue of pseudo-input selection and avoids the need for complex inter-block correlations in existing methods. The crucial trade-off becomes choosing between many simpler local model components or fewer complex global model components, which the practitioner can sensibly tune. Implementation is via a Metropolis-Hasting Markov chain Monte-Carlo algorithm with Bayesian back-fitting. We compare our model against popular alternatives on simulated and real datasets, and find the performance is competitive, while the fully Bayesian procedure enables the quantification of model uncertainties.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا