ترغب بنشر مسار تعليمي؟ اضغط هنا

The Brownian motion in the transformer model

62   0   0.0 ( 0 )
 نشر من قبل Yingshi Chen
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Yingshi Chen




اسأل ChatGPT حول البحث

Transformer is the state of the art model for many language and visual tasks. In this paper, we give a deep analysis of its multi-head self-attention (MHSA) module and find that: 1) Each token is a random variable in high dimensional feature space. 2) After layer normalization, these variables are mapped to points on the hyper-sphere. 3) The update of these tokens is a Brownian motion. The Brownian motion has special properties, its second order item should not be ignored. So we present a new second-order optimizer(an iterative K-FAC algorithm) for the MHSA module. In some short words: All tokens are mapped to high dimension hyper-sphere. The Scaled Dot-Product Attention $softmax(frac{mathbf{Q}mathbf{K}^T}{sqrt{d}})$ is just the Markov transition matrix for the random walking on the sphere. And the deep learning process would learn proper kernel function to get proper positions of these tokens. The training process in the MHSA module corresponds to a Brownian motion worthy of further study.



قيم البحث

اقرأ أيضاً

The theory of quantum Brownian motion describes the properties of a large class of open quantum systems. Nonetheless, its description in terms of a Born-Markov master equation, widely used in the literature, is known to violate the positivity of the density operator at very low temperatures. We study an extension of existing models, leading to an equation in the Lindblad form, which is free of this problem. We study the dynamics of the model, including the detailed properties of its stationary solution, for both constant and position-dependent coupling of the Brownian particle to the bath, focusing in particular on the correlations and the squeezing of the probability distribution induced by the environment
We present a modified Brownian motion model for random matrices where the eigenvalues (or levels) of a random matrix evolve in time in such a way that they never cross each others path. Also, owing to the exact integrability of the level dynamics, we incorporate long-time recurrences into the random walk problem underlying the Brownian motion. From this model, we derive the Coulomb interaction between the two eigenvalues. We further show that the Coulomb gas analogy fails if the confining potential, $V(E)$ is a transcendental function such that there exist orthogonal polynomials with weighting function, $exp [-beta E]$, where $beta $ is a symmetry parameter.
86 - Tian Qiu , H. T. Quan 2020
Quantum Brownian motion model is a typical model in the study of nonequilibrium quantum thermodynamics. Entropy is one of the most fundamental physical concepts in thermodynamics. In this work, by solving the quantum Langevin equation, we study the v on Neumann entropy of a particle undergoing quantum Brownian motion. In both the strong and the weak coupling regimes, we obtain the analytical expression of the time evolution of the Wigner function in terms of the initial Wigner function. The result is applied to the thermodynamic equilibrium initial state, which reproduces its classical counterpart in the high-temperature limit. Based on these results, for those initial states having well-defined classical counterparts, we obtain the explicit expression of the quantum corrections to the entropy of the system. Moreover, under the Markovian approximation, we obtain the expression of the quantum corrections to the total entropy production rate ${e_{rm p}}$ and the heat dissipation rate ${h_{rm d}}$. Our results bring important insights to the understanding of entropy in open quantum systems.
127 - Chi-Chun Zhou , Ping Zhang , 2020
A Brownian particle in an ideal quantum gas is considered. The mean square displacement (MSD) is derived. The Bose-Einstein or Fermi-Dirac distribution, other than the Maxwell-Boltzmann distribution, provides a different stochastic force compared wit h the classical Brownian motion. The MSD, which depends on the thermal wavelength and the density of medium particles, reflects the quantum effect on the Brownian particle explicitly. The result shows that the MSD in an ideal Bose gas is shorter than that in a Fermi gas. The behavior of the quantum Brownian particle recovers the classical Brownian particle as the temperature raises. At low temperatures, the quantum effect becomes obvious. For example, there is a random motion of the Brownian particle due to the fermionic exchange interaction even the temperature is near the absolute zero.
We investigate the classical Brownian motion of a particle in a two-dimensional noncommutative (NC) space. Using the standard NC algebra embodied by the sympletic Weyl-Moyal formalism we find that noncommutativity induces a non-vanishing correlation between both coordinates at different times. The effect stands out as a signature of spatial noncommutativity and thus could offer a way to experimentally detect the phenomena. We further discuss some limiting scenarios and the trade-off between the scale imposed by the NC structure and the parameters of the Brownian motion itself.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا