ترغب بنشر مسار تعليمي؟ اضغط هنا

Compared to monolingual models, cross-lingual models usually require a more expressive vocabulary to represent all languages adequately. We find that many languages are under-represented in recent cross-lingual language models due to the limited voca bulary capacity. To this end, we propose an algorithm VoCap to determine the desired vocabulary capacity of each language. However, increasing the vocabulary size significantly slows down the pre-training speed. In order to address the issues, we propose k-NN-based target sampling to accelerate the expensive softmax. Our experiments show that the multilingual vocabulary learned with VoCap benefits cross-lingual language model pre-training. Moreover, k-NN-based target sampling mitigates the side-effects of increasing the vocabulary size while achieving comparable performance and faster pre-training speed. The code and the pretrained multilingual vocabularies are available at https://github.com/bozheng-hit/VoCapXLM.
Nodes in networks may have one or more functions that determine their role in the system. As opposed to local proximity, which captures the local context of nodes, the role identity captures the functional role that nodes play in a network, such as b eing the center of a group, or the bridge between two groups. This means that nodes far apart in a network can have similar structural role identities. Several recent works have explored methods for embedding the roles of nodes in networks. However, these methods all rely on either approximating or indirect modeling of structural equivalence. In this paper, we present a novel and flexible framework using stress majorization, to transform the high-dimensional role identities in networks directly (without approximation or indirect modeling) to a low-dimensional embedding space. Our method is also flexible, in that it does not rely on specific structural similarity definitions. We evaluated our method on the tasks of node classification, clustering, and visualization, using three real-world and five synthetic networks. Our experiments show that our framework achieves superior results than existing methods in learning node role representations.
Recent years have seen a rise in the development of representational learning methods for graph data. Most of these methods, however, focus on node-level representation learning at various scales (e.g., microscopic, mesoscopic, and macroscopic node e mbedding). In comparison, methods for representation learning on whole graphs are currently relatively sparse. In this paper, we propose a novel unsupervised whole graph embedding method. Our method uses spectral graph wavelets to capture topological similarities on each k-hop sub-graph between nodes and uses them to learn embeddings for the whole graph. We evaluate our method against 12 well-known baselines on 4 real-world datasets and show that our method achieves the best performance across all experiments, outperforming the current state-of-the-art by a considerable margin.
By using Lanczos exact diagonalization and quantum Monte Carlo combined with stochastic analytic continuation, we study the dynamical properties of the $S=1$ antiferromagnetic Heisenberg chain with different strengths of bond disorder. In the weak di sorder region, we find weakly coupled bonds which can induce additional low-energy excitation below the one-magnon mode. As the disorder increases, the average Haldane gap closes at $delta_{Delta}sim 0.5$ with more and more low-energy excitations coming out. After the critical disorder strength $delta_csim 1$, the system reaches a random-singlet phase with prominent sharp peak at $omega=0$ and broad continuum at $omega>0$ of the dynamic spin structure factor. In addition, we analyze the distribution of random spin domains and numerically find three kinds of domains hosting effective spin-1/2 quanta or spin-1 sites in between. These spins can form the weakly coupled long-range singlets due to quantum fluctuation which contribute to the sharp peak at $omega=0$.
Given an unlabeled graph $G$ on $n$ vertices, let ${N_{G}(v)}_{v}$ be the collection of subgraphs of $G$, where for each vertex $v$ of $G$, $N_{G}(v)$ is the subgraph of $G$ induced by vertices of $G$ of distance at most one from $v$. We show that th ere are universal constants $C,c>0$ with the following property. Let the sequence $(p_n)_{n=1}^infty$ satisfy $n^{-1/2}log^C nleq p_nleq c$. For each $n$, let $Gamma_n$ be an unlabeled $G(n,p_n)$ Erdos-Renyi graph. Then with probability $1-o(1)$, any unlabeled graph $tilde Gamma_n$ on $n$ vertices with ${N_{tilde Gamma_n}(v)}_{v}={N_{Gamma_n}(v)}_{v}$ must coincide with $Gamma_n$. This establishes $p_n= tilde Theta(n^{-1/2})$ as the transition for the density parameter $p_n$ between reconstructability and non-reconstructability of Erdos-Renyi graphs from their $1$--neighborhoods, answering a question of Gaudio and Mossel.
Computing Nash equilibrium in bimatrix games is PPAD-hard, and many works have focused on the approximate solutions. When games are generated from a fixed unknown distribution, learning a Nash predictor via data-driven approaches can be preferable. I n this paper, we study the learnability of approximate Nash equilibrium in bimatrix games. We prove that Lipschitz function class is agnostic Probably Approximately Correct (PAC) learnable with respect to Nash approximation loss. Additionally, to demonstrate the advantages of learning a Nash predictor, we develop a model that can efficiently approximate solutions for games under the same distribution. We show by experiments that the solutions from our Nash predictor can serve as effective initializing points for other Nash solvers.
Deep Reinforcement learning is a branch of unsupervised learning in which an agent learns to act based on environment state in order to maximize its total reward. Deep reinforcement learning provides good opportunity to model the complexity of portfo lio choice in high-dimensional and data-driven environment by leveraging the powerful representation of deep neural networks. In this paper, we build a portfolio management system using direct deep reinforcement learning to make optimal portfolio choice periodically among S&P500 underlying stocks by learning a good factor representation (as input). The result shows that an effective learning of market conditions and optimal portfolio allocations can significantly outperform the average market.
Deep neural networks (DNNs) are known to perform well when deployed to test distributions that shares high similarity with the training distribution. Feeding DNNs with new data sequentially that were unseen in the training distribution has two major challenges -- fast adaptation to new tasks and catastrophic forgetting of old tasks. Such difficulties paved way for the on-going research on few-shot learning and continual learning. To tackle these problems, we introduce Attentive Independent Mechanisms (AIM). We incorporate the idea of learning using fast and slow weights in conjunction with the decoupling of the feature extraction and higher-order conceptual learning of a DNN. AIM is designed for higher-order conceptual learning, modeled by a mixture of experts that compete to learn independent concepts to solve a new task. AIM is a modular component that can be inserted into existing deep learning frameworks. We demonstrate its capability for few-shot learning by adding it to SIB and trained on MiniImageNet and CIFAR-FS, showing significant improvement. AIM is also applied to ANML and OML trained on Omniglot, CIFAR-100 and MiniImageNet to demonstrate its capability in continual learning. Code made publicly available at https://github.com/huang50213/AIM-Fewshot-Continual.
Deep neural networks suffer from catastrophic forgetting when learning multiple knowledge sequentially, and a growing number of approaches have been proposed to mitigate this problem. Some of these methods achieved considerable performance by associa ting the flat local minima with forgetting mitigation in continual learning. However, they inevitably need (1) tedious hyperparameters tuning, and (2) additional computational cost. To alleviate these problems, in this paper, we propose a simple yet effective optimization method, called AlterSGD, to search for a flat minima in the loss landscape. In AlterSGD, we conduct gradient descent and ascent alternatively when the network tends to converge at each session of learning new knowledge. Moreover, we theoretically prove that such a strategy can encourage the optimization to converge to a flat minima. We verify AlterSGD on continual learning benchmark for semantic segmentation and the empirical results show that we can significantly mitigate the forgetting and outperform the state-of-the-art methods with a large margin under challenging continual learning protocols.
The advancement of convolutional neural networks (CNNs) on various vision applications has attracted lots of attention. Yet the majority of CNNs are unable to satisfy the strict requirement for real-world deployment. To overcome this, the recent popu lar network pruning is an effective method to reduce the redundancy of the models. However, the ranking of filters according to their importance on different pruning criteria may be inconsistent. One filter could be important according to a certain criterion, while it is unnecessary according to another one, which indicates that each criterion is only a partial view of the comprehensive importance. From this motivation, we propose a novel framework to integrate the existing filter pruning criteria by exploring the criteria diversity. The proposed framework contains two stages: Criteria Clustering and Filters Importance Calibration. First, we condense the pruning criteria via layerwise clustering based on the rank of importance score. Second, within each cluster, we propose a calibration factor to adjust their significance for each selected blending candidates and search for the optimal blending criterion via Evolutionary Algorithm. Quantitative results on the CIFAR-100 and ImageNet benchmarks show that our framework outperforms the state-of-the-art baselines, regrading to the compact model performance after pruning.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا