ترغب بنشر مسار تعليمي؟ اضغط هنا

Sampling is a critical operation in the training of Graph Neural Network (GNN) that helps reduce the cost. Previous works have explored improving sampling algorithms through mathematical and statistical methods. However, there is a gap between sampli ng algorithms and hardware. Without consideration of hardware, algorithm designers merely optimize sampling at the algorithm level, missing the great potential of promoting the efficiency of existing sampling algorithms by leveraging hardware features. In this paper, we first propose a unified programming model for mainstream sampling algorithms, termed GNNSampler, covering the key processes for sampling algorithms in various categories. Second, we explore the data locality among nodes and their neighbors (i.e., the hardware feature) in real-world datasets for alleviating the irregular memory access in sampling. Third, we implement locality-aware optimizations in GNNSampler for diverse sampling algorithms to optimize the general sampling process in the training of GNN. Finally, we emphatically conduct experiments on large graph datasets to analyze the relevance between the training time, model accuracy, and hardware-level metrics, which helps achieve a good trade-off between time and accuracy in GNN training. Extensive experimental results show that our method is universal to mainstream sampling algorithms and reduces the training time of GNN (range from 4.83% with layer-wise sampling to 44.92% with subgraph-based sampling) with comparable accuracy.
There has been experimental evidence for the Majorana zero modes (MZMs) in solid state systems, which are building blocks for potential topological quantum computing. It is important to design devices, in which MZMs are easy to manipulate and possess a broad topological non-trivial parameter space for fusion and braiding. Here, we propose that the Majorana vortex states in iron-based superconducting nanowires fulfill these desirable conditions. This system has a radius-induced topological phase transition, giving a lower limit to the radius of the nanowire. In the topological phase, there is only one pair of MZMs in the nanowire over a wide range of radius, chemical potential, and external magnetic field. The wavefunction of the MZM has a sizable distribution at the side edge of the nanowire. This property enables one to control the interaction of the MZMs in neighboring vortex nanowires, and paves the way for Majorana fusion and braiding.
80 - Xinxin Liu , Yu Hua , Rong Bai 2021
Coalescing RDMA and Persistent Memory (PM) delivers high end-to-end performance for networked storage systems, which requires rethinking the design of efficient hash structures. In general, existing hashing schemes separately optimize RDMA and PM, th us partially addressing the problems of RDMA Access Amplification and High-Overhead PM Consistency. In order to address these problems, we propose a continuity hashing, which is a one-stone-two-birds design to optimize both RDMA and PM. The continuity hashing leverages a fine-grained contiguous shared region, called SBuckets, to provide standby positions for the neighbouring two buckets in case of hash collisions. In the continuity hashing, remote read only needs a single RDMA read to directly fetch the home bucket and the neighbouring SBuckets, which contain all the positions of maintaining a key-value item, thus alleviating RDMA access amplification. Continuity hashing further leverages indicators that can be atomically modified to support log-free PM consistency for all the write operations. Evaluation results demonstrate that compared with state-of-the-art schemes, continuity hashing achieves high throughput (i.e., 1.45X -- 2.43X improvement), low latency (about 1.7X speedup) and the smallest number of PM writes with various workloads, while has acceptable load factors of about 70%.
103 - Xin Liu , Henglin Shi , Haoyu Chen 2021
We introduce a new dataset for the emotional artificial intelligence research: identity-free video dataset for Micro-Gesture Understanding and Emotion analysis (iMiGUE). Different from existing public datasets, iMiGUE focuses on nonverbal body gestur es without using any identity information, while the predominant researches of emotion analysis concern sensitive biometric data, like face and speech. Most importantly, iMiGUE focuses on micro-gestures, i.e., unintentional behaviors driven by inner feelings, which are different from ordinary scope of gestures from other gesture datasets which are mostly intentionally performed for illustrative purposes. Furthermore, iMiGUE is designed to evaluate the ability of models to analyze the emotional states by integrating information of recognized micro-gesture, rather than just recognizing prototypes in the sequences separately (or isolatedly). This is because the real need for emotion AI is to understand the emotional states behind gestures in a holistic way. Moreover, to counter for the challenge of imbalanced sample distribution of this dataset, an unsupervised learning method is proposed to capture latent representations from the micro-gesture sequences themselves. We systematically investigate representative methods on this dataset, and comprehensive experimental results reveal several interesting insights from the iMiGUE, e.g., micro-gesture-based analysis can promote emotion understanding. We confirm that the new iMiGUE dataset could advance studies of micro-gesture and emotion AI.
With the development of the Internet of Things(IoT) and Artificial Intelligence(AI) technologies, human activity recognition has enabled various applications, such as smart homes and assisted living. In this paper, we target a new healthcare applicat ion of human activity recognition, early mobility recognition for Intensive Care Unit(ICU) patients. Early mobility is essential for ICU patients who suffer from long-time immobilization. Our system includes accelerometer-based data collection from ICU patients and an AI model to recognize patients early mobility. To improve the model accuracy and stability, we identify features that are insensitive to sensor orientations and propose a segment voting process that leverages a majority voting strategy to recognize each segments activity. Our results show that our system improves model accuracy from 77.78% to 81.86% and reduces the model instability (standard deviation) from 16.69% to 6.92%, compared to the same AI model without our feature engineering and segment voting process.
As infamous invaders to the North American ecosystem, the Asian giant hornet (Vespa mandarinia) is devastating not only to native bee colonies, but also to local apiculture. One of the most effective way to combat the harmful species is to locate and destroy their nests. By mobilizing the public to actively report possible sightings of the Asian giant hornet, the governmentcould timely send inspectors to confirm and possibly destroy the nests. However, such confirmation requires lab expertise, where manually checking the reports one by one is extremely consuming of human resources. Further given the limited knowledge of the public about the Asian giant hornet and the randomness of report submission, only few of the numerous reports proved positive, i.e. existing nests. How to classify or prioritize the reports efficiently and automatically, so as to determine the dispatch of personnel, is of great significance to the control of the Asian giant hornet. In this paper, we propose a method to predict the priority of sighting reports based on machine learning. We model the problem of optimal prioritization of sighting reports as a problem of classification and prediction. We extracted a variety of rich features in the report: location, time, image(s), and textual description. Based on these characteristics, we propose a classification model based on logistic regression to predict the credibility of a certain report. Furthermore, our model quantifies the impact between reports to get the priority ranking of the reports. Extensive experiments on the public dataset from the WSDA (the Washington State Department of Agriculture) have proved the effectiveness of our method.
Decentralized optimization, particularly the class of decentralized composite convex optimization (DCCO) problems, has found many applications. Due to ubiquitous communication congestion and random dropouts in practice, it is highly desirable to desi gn decentralized algorithms that can handle stochastic communication networks. However, most existing algorithms for DCCO only work in time-invariant networks and cannot be extended to stochastic networks because they inherently need knowledge of network topology $textit{a priori}$. In this paper, we propose a new decentralized dual averaging (DDA) algorithm that can solve DCCO in stochastic networks. Under a rather mild condition on stochastic networks, we show that the proposed algorithm attains $textit{global linear convergence}$ if each local objective function is strongly convex. Our algorithm substantially improves the existing DDA-type algorithms as the latter were only known to converge $textit{sublinearly}$ prior to our work. The key to achieving the improved rate is the design of a novel dynamic averaging consensus protocol for DDA, which intuitively leads to more accurate local estimates of the global dual variable. To the best of our knowledge, this is the first linearly convergent DDA-type decentralized algorithm and also the first algorithm that attains global linear convergence for solving DCCO in stochastic networks. Numerical results are also presented to support our design and analysis.
Detecting anomalies for dynamic graphs has drawn increasing attention due to their wide applications in social networks, e-commerce, and cybersecurity. The recent deep learning-based approaches have shown promising results over shallow methods. Howev er, they fail to address two core challenges of anomaly detection in dynamic graphs: the lack of informative encoding for unattributed nodes and the difficulty of learning discriminate knowledge from coupled spatial-temporal dynamic graphs. To overcome these challenges, in this paper, we present a novel Transformer-based Anomaly Detection framework for DYnamic graph (TADDY). Our framework constructs a comprehensive node encoding strategy to better represent each nodes structural and temporal roles in an evolving graphs stream. Meanwhile, TADDY captures informative representation from dynamic graphs with coupled spatial-temporal patterns via a dynamic graph transformer model. The extensive experimental results demonstrate that our proposed TADDY framework outperforms the state-of-the-art methods by a large margin on four real-world datasets.
A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary. This potentially weakens the effect when applying pretrained models into natural language generation (NLG) tasks, especially f or the subword distributions between upstream and downstream tasks with significant discrepancy. Towards approaching this problem, we extend the vanilla pretrain-finetune pipeline with an extra embedding transfer step. Specifically, a plug-and-play embedding generator is introduced to produce the representation of any input token, according to pre-trained embeddings of its morphologically similar ones. Thus, embeddings of mismatch tokens in downstream tasks can also be efficiently initialized. We conduct experiments on a variety of NLG tasks under the pretrain-finetune fashion. Experimental results and extensive analyses show that the proposed strategy offers us opportunities to feel free to transfer the vocabulary, leading to more efficient and better performed downstream NLG models.
Recently, knowledge distillation (KD) has shown great success in BERT compression. Instead of only learning from the teachers soft label as in conventional KD, researchers find that the rich information contained in the hidden layers of BERT is condu cive to the students performance. To better exploit the hidden knowledge, a common practice is to force the student to deeply mimic the teachers hidden states of all the tokens in a layer-wise manner. In this paper, however, we observe that although distilling the teachers hidden state knowledge (HSK) is helpful, the performance gain (marginal utility) diminishes quickly as more HSK is distilled. To understand this effect, we conduct a series of analysis. Specifically, we divide the HSK of BERT into three dimensions, namely depth, length and width. We first investigate a variety of strategies to extract crucial knowledge for each single dimension and then jointly compress the three dimensions. In this way, we show that 1) the students performance can be improved by extracting and distilling the crucial HSK, and 2) using a tiny fraction of HSK can achieve the same performance as extensive HSK distillation. Based on the second finding, we further propose an efficient KD paradigm to compress BERT, which does not require loading the teacher during the training of student. For two kinds of student models and computing devices, the proposed KD paradigm gives rise to training speedup of 2.7x ~ 3.4x.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا