A geometry where everything is better than nice

94 0 0.0 ( 0 )

Download Cite

Added by Peter Gibson

Publication date 2016

fields

and research's language is English

Authors Larry Bates - Peter Gibson

Differential Geometry

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a riemannian structure on the disk that has a remarkably rich structure. Geodesics are hypocycloids and the (negative of the) laplacian has integer spectrum with multiplicity the Dirichlet divisor function. Eigenfunctions of the laplacian are orthogonal polynomials naturally suited to the analysis of acoustic scattering in layered media.

rate research

Is Attention Better Than Matrix Decomposition?

138 - Zhengyang Geng , Meng-Hao Guo , Hongxu Chen 2021

As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank recovery problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs. Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context, including semantic segmentation and image generation, demonstrating significant improvements over self-attention and its variants.

Computer Vision and Pattern Recognition Machine Learning

Is Local SGD Better than Minibatch SGD?

302 - Blake Woodworth , Kumar Kshitij Patel , Sebastian U. Stich 2020

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.

Machine Learning Optimization and Control Machine Learning

Why there is something rather than nothing (out of everything)?

510 - A.O.Barvinsky 2007

The path integral over Euclidean geometries for the recently suggested density matrix of the Universe is shown to describe a microcanonical ensemble in quantum cosmology. This ensemble corresponds to a uniform (weight one) distribution in phase space of true physical variables, but in terms of the observable spacetime geometry it is peaked about complex saddle-points of the {em Lorentzian} path integral. They are represented by the recently obtained cosmological instantons limited to a bounded range of the cosmological constant. Inflationary cosmologies generated by these instantons at late stages of expansion undergo acceleration whose low-energy scale can be attained within the concept of dynamically evolving extra dimensions. Thus, together with the bounded range of the early cosmological constant, this cosmological ensemble suggests the mechanism of constraining the landscape of string vacua and, simultaneously, a possible solution to the dark energy problem in the form of the quasi-equilibrium decay of the microcanonical state of the Universe.

High Energy Physics - Theory

Fast is better than free: Revisiting adversarial training

97 - Eric Wong , Leslie Rice , J. Zico Kolter 2020

Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Specifically, we show that adversarial training with the fast gradient sign method (FGSM), when combined with random initialization, is as effective as PGD-based training but has significantly lower cost. Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $epsilon=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $epsilon=2/255$ in 12 hours, in comparison to past work based on free adversarial training which took 10 and 50 hours to reach the same respective thresholds. Finally, we identify a failure mode referred to as catastrophic overfitting which may have caused previous attempts to use FGSM adversarial training to fail. All code for reproducing the experiments in this paper as well as pretrained model weights are at https://github.com/locuslab/fast_adversarial.

Machine Learning Machine Learning

EigenGame Unloaded: When playing games is better than optimizing

187 - Ian Gemp , Brian McWilliams , Claire Vernade 2021

We build on the recently proposed EigenGame that views eigendecomposition as a competitive game. EigenGames updates are biased if computed using minibatches of data, which hinders convergence and more sophisticated parallelism in the stochastic setting. In this work, we propose an unbiased stochastic update that is asymptotically equivalent to EigenGame, enjoys greater parallelism allowing computation on datasets of larger sample sizes, and outperforms EigenGame in experiments. We present applications to finding the principal components of massive datasets and performing spectral clustering of graphs. We analyze and discuss our proposed update in the context of EigenGame and the shift in perspective from optimization to games.

Machine Learning Artificial Intelligence Machine Learning