No Arabic abstract
Recently, there has been a surge of interest in representation learning in hyperbolic spaces, driven by their ability to represent hierarchical data with significantly fewer dimensions than standard Euclidean spaces. However, the viability and benefits of hyperbolic spaces for downstream machine learning tasks have received less attention. In this paper, we present, to our knowledge, the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space. Specifically, we consider the problem of learning a large-margin classifier for data possessing a hierarchical structure. Our first contribution is a hyperbolic perceptron algorithm, which provably converges to a separating hyperplane. We then provide an algorithm to efficiently learn a large-margin hyperplane, relying on the careful injection of adversarial examples. Finally, we prove that for hierarchical data that embeds well into hyperbolic space, the low embedding dimension ensures superior guarantees when learning the classifier directly in hyperbolic space.
Representing data in hyperbolic space can effectively capture latent hierarchical relationships. With the goal of enabling accurate classification of points in hyperbolic space while respecting their hyperbolic geometry, we introduce hyperbolic SVM, a hyperbolic formulation of support vector machine classifiers, and elucidate through new theoretical work its connection to the Euclidean counterpart. We demonstrate the performance improvement of hyperbolic SVM for multi-class prediction tasks on real-world complex networks as well as simulated datasets. Our work allows analytic pipelines that take the inherent hyperbolic geometry of the data into account in an end-to-end fashion without resorting to ill-fitting tools developed for Euclidean space.
We consider the problem of cost sensitive multiclass classification, where we would like to increase the sensitivity of an important class at the expense of a less important one. We adopt an {em apportioned margin} framework to address this problem, which enables an efficient margin shift between classes that share the same boundary. The decision boundary between all pairs of classes divides the margin between them in accordance to a given prioritization vector, which yields a tighter error bound for the important classes while also reducing the overall out-of-sample error. In addition to demonstrating an efficient implementation of our framework, we derive generalization bounds, demonstrate Fisher consistency, adapt the framework to Mercers kernel and to neural networks, and report promising empirical results on all accounts.
This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized.
In this paper, we reformulate the forest representation learning approach as an additive model which boosts the augmented feature instead of the prediction. We substantially improve the upper bound of generalization gap from $mathcal{O}(sqrtfrac{ln m}{m})$ to $mathcal{O}(frac{ln m}{m})$, while $lambda$ - the margin ratio between the margin standard deviation and the margin mean is small enough. This tighter upper bound inspires us to optimize the margin distribution ratio $lambda$. Therefore, we design the margin distribution reweighting approach (mdDF) to achieve small ratio $lambda$ by boosting the augmented feature. Experiments and visualizations confirm the effectiveness of the approach in terms of performance and representation learning ability. This study offers a novel understanding of the cascaded deep forest from the margin-theory perspective and further uses the mdDF approach to guide the layer-by-layer forest representation learning.
The foundational concept of Max-Margin in machine learning is ill-posed for output spaces with more than two labels such as in structured prediction. In this paper, we show that the Max-Margin loss can only be consistent to the classification task under highly restrictive assumptions on the discrete loss measuring the error between outputs. These conditions are satisfied by distances defined in tree graphs, for which we prove consistency, thus being the first losses shown to be consistent for Max-Margin beyond the binary setting. We finally address these limitations by correcting the concept of Max-Margin and introducing the Restricted-Max-Margin, where the maximization of the loss-augmented scores is maintained, but performed over a subset of the original domain. The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.