Do you want to publish a course? Click here

k-Mixup Regularization for Deep Learning via Optimal Transport

134   0   0.0 ( 0 )
 Added by Kristjan Greenewald
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Mixup is a popular regularization technique for training deep neural networks that can improve generalization and increase adversarial robustness. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup to emph{$k$-mixup} by perturbing $k$-batches of training points in the direction of other $k$-batches using displacement interpolation, interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that $k$-mixup preserves cluster and manifold structures, and we extend theory studying efficacy of standard mixup. Our empirical results show that training with $k$-mixup further improves generalization and robustness on benchmark datasets.



rate research

Read More

Regularization in Optimal Transport (OT) problems has been shown to critically affect the associated computational and sample complexities. It also has been observed that regularization effectively helps in handling noisy marginals as well as marginals with unequal masses. However, existing works on OT restrict themselves to $phi$-divergences based regularization. In this work, we propose and analyze Integral Probability Metric (IPM) based regularization in OT problems. While it is expected that the well-established advantages of IPMs are inherited by the IPM-regularized OT variants, we interestingly observe that some useful aspects of $phi$-regularization are preserved. For example, we show that the OT formulation, where the marginal constraints are relaxed using IPM-regularization, also lifts the ground metric to that over (perhaps un-normalized) measures. Infact, the lifted metric turns out to be another IPM whose generating set is the intersection of that of the IPM employed for regularization and the set of 1-Lipschitz functions under the ground metric. Also, in the special case where the regularization is squared maximum mean discrepancy based, the proposed OT variant, as well as the corresponding Barycenter formulation, turn out to be those of minimizing a convex quadratic subject to non-negativity/simplex constraints and hence can be solved efficiently. Simulations confirm that the optimal transport plans/maps obtained with IPM-regularization are intrinsically different from those obtained with $phi$-regularization. Empirical results illustrate the efficacy of the proposed IPM-regularized OT formulation. This draft contains the main paper and the Appendices.
Metric learning involves learning a discriminative representation such that embeddings of similar classes are encouraged to be close, while embeddings of dissimilar classes are pushed far apart. State-of-the-art methods focus mostly on sophisticated loss functions or mining strategies. On the one hand, metric learning losses consider two or more examples at a time. On the other hand, modern data augmentation methods for classification consider two or more examples at a time. The combination of the two ideas is under-studied. In this work, we aim to bridge this gap and improve representations using mixup, which is a powerful data augmentation approach interpolating two or more examples and corresponding target labels at a time. This task is challenging because, unlike classification, the loss functions used in metric learning are not additive over examples, so the idea of interpolating target labels is not straightforward. To the best of our knowledge, we are the first to investigate mixing examples and target labels for deep metric learning. We develop a generalized formulation that encompasses existing metric learning loss functions and modify it to accommodate for mixup, introducing Metric Mix, or Metrix. We show that mixing inputs, intermediate representations or embeddings along with target labels significantly improves representations and outperforms state-of-the-art metric learning methods on four benchmark datasets.
198 - Kyle Swanson , Lili Yu , Tao Lei 2020
Selecting input features of top relevance has become a popular method for building self-explaining models. In this work, we extend this selective rationalization approach to text matching, where the goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction. Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs. However, directly applying OT often produces dense and therefore uninterpretable alignments. To overcome this limitation, we introduce novel constrained variants of the OT problem that result in highly sparse alignments with controllable sparsity. Our model is end-to-end differentiable using the Sinkhorn algorithm for OT and can be trained without any alignment annotations. We evaluate our model on the StackExchange, MultiNews, e-SNLI, and MultiRC datasets. Our model achieves very sparse rationale selections with high fidelity while preserving prediction accuracy compared to strong attention baseline models.
Inverse optimal transport (OT) refers to the problem of learning the cost function for OT from observed transport plan or its samples. In this paper, we derive an unconstrained convex optimization formulation of the inverse OT problem, which can be further augmented by any customizable regularization. We provide a comprehensive characterization of the properties of inverse OT, including uniqueness of solutions. We also develop two numerical algorithms, one is a fast matrix scaling method based on the Sinkhorn-Knopp algorithm for discrete OT, and the other one is a learning based algorithm that parameterizes the cost function as a deep neural network for continuous OT. The novel framework proposed in the work avoids repeatedly solving a forward OT in each iteration which has been a thorny computational bottleneck for the bi-level optimization in existing inverse OT approaches. Numerical results demonstrate promising efficiency and accuracy advantages of the proposed algorithms over existing state-of-the-art methods.
211 - Behnam Neyshabur 2017
In an attempt to better understand generalization in deep learning, we study several possible explanations. We show that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep learning models. Motivated by this view, we study how different complexity measures can ensure generalization and explain how optimization algorithms can implicitly regularize complexity measures. We empirically investigate the ability of these measures to explain different observed phenomena in deep learning. We further study the invariances in neural networks, suggest complexity measures and optimization algorithms that have similar invariances to those in neural networks and evaluate them on a number of learning tasks.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا