ترغب بنشر مسار تعليمي؟ اضغط هنا

FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention

65   0   0.0 ( 0 )
 نشر من قبل Bao Wang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose FMMformers, a class of efficient and flexible transformers inspired by the celebrated fast multipole method (FMM) for accelerating interacting particle simulation. FMM decomposes particle-particle interaction into near-field and far-field components and then performs direct and coarse-grained computation, respectively. Similarly, FMMformers decompose the attention into near-field and far-field attention, modeling the near-field attention by a banded matrix and the far-field attention by a low-rank matrix. Computing the attention matrix for FMMformers requires linear complexity in computational time and memory footprint with respect to the sequence length. In contrast, standard transformers suffer from quadratic complexity. We analyze and validate the advantage of FMMformers over the standard transformer on the Long Range Arena and language modeling benchmarks. FMMformers can even outperform the standard transformer in terms of accuracy by a significant margin. For instance, FMMformers achieve an average classification accuracy of $60.74%$ over the five Long Range Arena tasks, which is significantly better than the standard transformers average accuracy of $58.70%$.



قيم البحث

اقرأ أيضاً

We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems on heterogeneous diffusion networks. Our new framework leverages the Mori-Zwanzig formalism to obtain an exact evolution equation of th e individual node infection probabilities, which renders a delay differential equation with memory integral approximated by learnable time convolution operators. Directly using information diffusion cascade data, our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities. Connections between parameter learning and optimal control are also established, leading to a rigorous and implementable algorithm for training NMF. Moreover, we show that the projected gradient descent method can be employed to solve the challenging influence maximization problem, where the gradient is computed extremely fast by integrating NMF forward in time just once in each iteration. Extensive empirical studies show that our approach is versatile and robust to variations of the underlying diffusion network models, and significantly outperform existing approaches in accuracy and efficiency on both synthetic and real-world data.
Far-field directional scattering and near-field directional coupling from simple sources have recently received great attention in photonics: beyond circularly-polarized dipoles, whose directional coupling to evanescent waves was recently applied to acoustics, the near-field directionality of modes in optics includes phased combinations of electric and magnetic dipoles, such as the Janus dipole and the Huygens dipole, both of which have been experimentally implemented using high refractive index nanoparticles. In this work we extend this to acoustics: we propose the use of high acoustic index scatterers exhibiting phased combinations of acoustic monopoles and dipoles with far-field and near-field directionality. All solutions stem from the elegant acoustic angular spectrum of the acoustic source, in close analogy to electromagnetism. A Huygens acoustic source with zero backward scattering is proposed and numerically demonstrated, as well as a Janus source achieving face-selective and position-dependent evanescent coupling to nearby acoustic waveguides.
We describe an efficient near-field to far-field transformation for optical quasinormal modes, which are the dissipative modes of open cavities and plasmonic resonators with complex eigenfrequencies. As an application of the theory, we show how one c an compute the reservoir modes (or regularized quasinormal modes) outside the resonator, which are essential to use in both classical and quantum optics. We subsequently demonstrate how to efficiently compute the quantum optical parameters necessary in the theory of quantized quasinormal modes [Franke et al., Phys. Rev. Lett. 122, 213901 (2019)]. To confirm the accuracy of our technique, we directly compare with a Dyson equation approach currently used in the literature (in regimes where this is possible), and demonstrate several order of magnitude improvement for the calculation run times. We also introduce an efficient pole approximation for computing the quantized quasinormal mode parameters, since they require an integration over a range of frequencies. Using this approach, we show how to compute regularized quasinormal modes and quantum optical parameters for a full 3D metal dimer in under one minute on a standard desktop computer. Our technique is exemplified by studying the quasinormal modes of metal dimers and a hybrid structure consisting of a gold dimer on top of a photonic crystal beam. In the latter example, we show how to compute the quantum optical parameters that describe a pronounced Fano resonance, using structural geometries that cannot practically be solved using a Dyson equation approach. All calculations for the spontaneous emission rates are confirmed with full-dipole calculations in Maxwells equations and are shown to be in excellent agreement.
151 - Moming Duan , Duo Liu , Xinyuan Ji 2020
Federated Learning (FL) enables the multiple participating devices to collaboratively contribute to a global neural network model while keeping the training data locally. Unlike the centralized training setting, the non-IID and imbalanced (statistica l heterogeneity) training data of FL is distributed in the federated network, which will increase the divergences between the local models and global model, further degrading performance. In this paper, we propose a novel clustered federated learning (CFL) framework FedGroup, in which we 1) group the training of clients based on the similarities between the clients optimization directions for high training performance; 2) construct a new data-driven distance measure to improve the efficiency of the client clustering procedure. 3) implement a newcomer device cold start mechanism based on the auxiliary global model for framework scalability and practicality. FedGroup can achieve improvements by dividing joint optimization into groups of sub-optimization and can be combined with FL optimizer FedProx. The convergence and complexity are analyzed to demonstrate the efficiency of our proposed framework. We also evaluate FedGroup and FedGrouProx (combined with FedProx) on several open datasets and made comparisons with related CFL frameworks. The results show that FedGroup can significantly improve absolute test accuracy by +14.1% on FEMNIST compared to FedAvg. +3.4% on Sentiment140 compared to FedProx, +6.9% on MNIST compared to FeSEM.
Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals. However, this power of most deep learning techniques hea vily relies on an abundant amount of training data, so the performance of classic neural nets decreases sharply when the number of training data samples is small or unseen data are presented in the testing phase. This calls for an advanced strategy, i.e., model-agnostic meta-learning (MAML), which is able to capture the invariant representation of the data samples or signals. In this paper, inspired by the special structure of the signal, i.e., real and imaginary parts consisted in practical time-series signals, we propose a Complex-valued Attentional MEta Learner (CAMEL) for the problem of few-shot signal recognition by leveraging attention and meta-learning in the complex domain. To the best of our knowledge, this is also the first complex-valued MAML that can find the first-order stationary points of general nonconvex problems with theoretical convergence guarantees. Extensive experiments results showcase the superiority of the proposed CAMEL compared with the state-of-the-art methods.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا