Do you want to publish a course? Click here

A geometry where everything is better than nice

94   0   0.0 ( 0 )
 Added by Peter Gibson
 Publication date 2016
  fields
and research's language is English




Ask ChatGPT about the research

We present a riemannian structure on the disk that has a remarkably rich structure. Geodesics are hypocycloids and the (negative of the) laplacian has integer spectrum with multiplicity the Dirichlet divisor function. Eigenfunctions of the laplacian are orthogonal polynomials naturally suited to the analysis of acoustic scattering in layered media.

rate research

Read More

As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank recovery problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs. Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context, including semantic segmentation and image generation, demonstrating significant improvements over self-attention and its variants.
We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.
148 - A.O.Barvinsky 2007
The path integral over Euclidean geometries for the recently suggested density matrix of the Universe is shown to describe a microcanonical ensemble in quantum cosmology. This ensemble corresponds to a uniform (weight one) distribution in phase space of true physical variables, but in terms of the observable spacetime geometry it is peaked about complex saddle-points of the {em Lorentzian} path integral. They are represented by the recently obtained cosmological instantons limited to a bounded range of the cosmological constant. Inflationary cosmologies generated by these instantons at late stages of expansion undergo acceleration whose low-energy scale can be attained within the concept of dynamically evolving extra dimensions. Thus, together with the bounded range of the early cosmological constant, this cosmological ensemble suggests the mechanism of constraining the landscape of string vacua and, simultaneously, a possible solution to the dark energy problem in the form of the quasi-equilibrium decay of the microcanonical state of the Universe.
Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Specifically, we show that adversarial training with the fast gradient sign method (FGSM), when combined with random initialization, is as effective as PGD-based training but has significantly lower cost. Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $epsilon=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $epsilon=2/255$ in 12 hours, in comparison to past work based on free adversarial training which took 10 and 50 hours to reach the same respective thresholds. Finally, we identify a failure mode referred to as catastrophic overfitting which may have caused previous attempts to use FGSM adversarial training to fail. All code for reproducing the experiments in this paper as well as pretrained model weights are at https://github.com/locuslab/fast_adversarial.
We build on the recently proposed EigenGame that views eigendecomposition as a competitive game. EigenGames updates are biased if computed using minibatches of data, which hinders convergence and more sophisticated parallelism in the stochastic setting. In this work, we propose an unbiased stochastic update that is asymptotically equivalent to EigenGame, enjoys greater parallelism allowing computation on datasets of larger sample sizes, and outperforms EigenGame in experiments. We present applications to finding the principal components of massive datasets and performing spectral clustering of graphs. We analyze and discuss our proposed update in the context of EigenGame and the shift in perspective from optimization to games.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا