No Arabic abstract
This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this theory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.
Generalization in deep learning has been the topic of much recent theoretical and empirical research. Here we introduce desiderata for techniques that predict generalization errors for deep learning models in supervised learning. Such predictions should 1) scale correctly with data complexity; 2) scale correctly with training set size; 3) capture differences between architectures; 4) capture differences between optimization algorithms; 5) be quantitatively not too far from the true error (in particular, be non-vacuous); 6) be efficiently computable; and 7) be rigorous. We focus on generalization error upper bounds, and introduce a categorisation of bounds depending on assumptions on the algorithm and data. We review a wide range of existing approaches, from classical VC dimension to recent PAC-Bayesian bounds, commenting on how well they perform against the desiderata. We next use a function-based picture to derive a marginal-likelihood PAC-Bayesian bound. This bound is, by one definition, optimal up to a multiplicative constant in the asymptotic limit of large training sets, as long as the learning curve follows a power law, which is typically found in practice for deep learning problems. Extensive empirical analysis demonstrates that our marginal-likelihood PAC-Bayes bound fulfills desiderata 1-3 and 5. The results for 6 and 7 are promising, but not yet fully conclusive, while only desideratum 4 is currently beyond the scope of our bound. Finally, we comment on why this function-based bound performs significantly better than current parameter-based PAC-Bayes bounds.
Algorithmic decision making process now affects many aspects of our lives. Standard tools for machine learning, such as classification and regression, are subject to the bias in data, and thus direct application of such off-the-shelf tools could lead to a specific group being unfairly discriminated. Removing sensitive attributes of data does not solve this problem because a textit{disparate impact} can arise when non-sensitive attributes and sensitive attributes are correlated. Here, we study a fair machine learning algorithm that avoids such a disparate impact when making a decision. Inspired by the two-stage least squares method that is widely used in the field of economics, we propose a two-stage algorithm that removes bias in the training data. The proposed algorithm is conceptually simple. Unlike most of existing fair algorithms that are designed for classification tasks, the proposed method is able to (i) deal with regression tasks, (ii) combine explanatory attributes to remove reverse discrimination, and (iii) deal with numerical sensitive attributes. The performance and fairness of the proposed algorithm are evaluated in simulations with synthetic and real-world datasets.
We formulate the problem of sampling and recovering clustered graph signal as a multi-armed bandit (MAB) problem. This formulation lends naturally to learning sampling strategies using the well-known gradient MAB algorithm. In particular, the sampling strategy is represented as a probability distribution over the individual arms of the MAB and optimized using gradient ascent. Some illustrative numerical experiments indicate that the sampling strategies based on the gradient MAB algorithm outperform existing sampling methods.
Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of man and machine. Here we present Expert-Augmented Machine Learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We use a large dataset of intensive care patient data to predict mortality and show that we can extract expert knowledge using an online platform, help reveal hidden confounders, improve generalizability on a different population and learn using less data. EAML presents a novel framework for high performance and dependable machine learning in critical applications.
We present an approach, based on learning an intrinsic data manifold, for the initialization of the internal state values of LSTM recurrent neural networks, ensuring consistency with the initial observed input data. Exploiting the generalized synchronization concept, we argue that the converged, mature internal states constitute a function on this learned manifold. The dimension of this manifold then dictates the length of observed input time series data required for consistent initialization. We illustrate our approach through a partially observed chemical model system, where initializing the internal LSTM states in this fashion yields visibly improved performance. Finally, we show that learning this data manifold enables the transformation of partially observed dynamics into fully observed ones, facilitating alternative identification paths for nonlinear dynamical systems.