No Arabic abstract
Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods.
Connectivity is a central notion of graph theory and plays an important role in graph algorithm design and applications. With emerging new applications in networks, a new type of graph connectivity problem has been getting more attention--hedge connectivity. In this paper, we consider the model of hedge graphs without hedge overlaps, where edges are partitioned into subsets called hedges that fail together. The hedge connectivity of a graph is the minimum number of hedges whose removal disconnects the graph. This model is more general than the hypergraph, which brings new computational challenges. It has been a long open problem whether this problem is solvable in polynomial time. In this paper, we study the combinatorial properties of hedge graph connectivity without hedge overlaps, based on its extremal conditions as well as hedge contraction operations, which provide new insights into its algorithmic progress.
In transfer learning, we wish to make inference about a target population when we have access to data both from the distribution itself, and from a different but related source distribution. We introduce a flexible framework for transfer learning in the context of binary classification, allowing for covariate-dependent relationships between the source and target distributions that are not required to preserve the Bayes decision boundary. Our main contributions are to derive the minimax optimal rates of convergence (up to poly-logarithmic factors) in this problem, and show that the optimal rate can be achieved by an algorithm that adapts to key aspects of the unknown transfer relationship, as well as the smoothness and tail parameters of our distributional classes. This optimal rate turns out to have several regimes, depending on the interplay between the relative sample sizes and the strength of the transfer relationship, and our algorithm achieves optimality by careful, decision tree-based calibration of local nearest-neighbour procedures.
Many traditional signal recovery approaches can behave well basing on the penalized likelihood. However, they have to meet with the difficulty in the selection of hyperparameters or tuning parameters in the penalties. In this article, we propose a global adaptive generative adjustment (GAGA) algorithm for signal recovery, in which multiple hyperpameters are automatically learned and alternatively updated with the signal. We further prove that the output of our algorithm directly guarantees the consistency of model selection and the asymptotic normality of signal estimate. Moreover, we also propose a variant GAGA algorithm for improving the computational efficiency in the high-dimensional data analysis. Finally, in the simulated experiment, we consider the consistency of the outputs of our algorithms, and compare our algorithms to other penalized likelihood methods: the Adaptive LASSO, the SCAD and the MCP. The simulation results support the efficiency of our algorithms for signal recovery, and demonstrate that our algorithms outperform the other algorithms.
We develop a data driven approach to perform clustering and end-to-end feature learning simultaneously for streaming data that can adaptively detect novel clusters in emerging data. Our approach, Adaptive Nonparametric Variational Autoencoder (AdapVAE), learns the cluster membership through a Bayesian Nonparametric (BNP) modeling framework with Deep Neural Networks (DNNs) for feature learning. We develop a joint online variational inference algorithm to learn feature representations and clustering assignments simultaneously via iteratively optimizing the Evidence Lower Bound (ELBO). We resolve the catastrophic forgetting citep{kirkpatrick2017overcoming} challenges with streaming data by adopting generative samples from the trained AdapVAE using previous data, which avoids the need of storing and reusing past data. We demonstrate the advantages of our model including adaptive novel cluster detection without discarding useful information learned from past data, high quality sample generation and comparable clustering performance as end-to-end batch mode clustering methods on both image and text corpora benchmark datasets.
While machine learning techniques have been successfully applied in several fields, the black-box nature of the models presents challenges for interpreting and explaining the results. We develop a new framework called Adaptive Explainable Neural Networks (AxNN) for achieving the dual goals of good predictive performance and model interpretability. For predictive performance, we build a structured neural network made up of ensembles of generalized additive model networks and additive index models (through explainable neural networks) using a two-stage process. This can be done using either a boosting or a stacking ensemble. For interpretability, we show how to decompose the results of AxNN into main effects and higher-order interaction effects. The computations are inherited from Googles open source tool AdaNet and can be efficiently accelerated by training with distributed computing. The results are illustrated on simulated and real datasets.