BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices

151 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhe Zhou

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhe Zhou - Bizhao Shi - Zhe Zhang

الذكاء الاصطناعي هندسة العتاد التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In recent years, Graph Neural Networks (GNNs) appear to be state-of-the-art algorithms for analyzing non-euclidean graph data. By applying deep-learning to extract high-level representations from graph structures, GNNs achieve extraordinary accuracy and great generalization ability in various tasks. However, with the ever-increasing graph sizes, more and more complicated GNN layers, and higher feature dimensions, the computational complexity of GNNs grows exponentially. How to inference GNNs in real time has become a challenging problem, especially for some resource-limited edge-computing platforms. To tackle this challenge, we propose BlockGNN, a software-hardware co-design approach to realize efficient GNN acceleration. At the algorithm level, we propose to leverage block-circulant weight matrices to greatly reduce the complexity of various GNN models. At the hardware design level, we propose a pipelined CirCore architecture, which supports efficient block-circulant matrices computation. Basing on CirCore, we present a novel BlockGNN accelerator to compute various GNNs with low latency. Moreover, to determine the optimal configurations for diverse deployed tasks, we also introduce a performance and resource model that helps choose the optimal hardware parameters automatically. Comprehensive experiments on the ZC706 FPGA platform demonstrate that on various GNN tasks, BlockGNN achieves up to $8.3times$ speedup compared to the baseline HyGCN architecture and $111.9times$ energy reduction compared to the Intel Xeon CPU platform.

قيم البحث

104 - Joe Gildea , Abidin Kaya , Rhian Taylor 2020

In this paper, we construct self-dual codes from a construction that involves both block circulant matrices and block quadratic residue circulant matrices. We provide conditions when this construction can yield self-dual codes. We construct self-dual codes of various lengths over F2 and F2 + uF2. Using extensions, neighbours and sequences of neighbours, we construct many new self-dual codes. In particular, we construct one new self-dual code of length 66 and 51 new self-dual codes of length 68.

حلقات وجبر التوافقية

An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM

119 - Geng Yuan , Xiaolong Ma , Caiwen Ding 2019

The high computation and memory storage of large deep neural networks (DNNs) models pose intensive challenges to the conventional Von-Neumann architecture, incurring substantial data movements in the memory hierarchy. The memristor crossbar array has emerged as a promising solution to mitigate the challenges and enable low-power acceleration of DNNs. Memristor-based weight pruning and weight quantization have been seperately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model. However, there has been no systematic investigation of memristor-based neuromorphic computing (NC) systems considering both weight pruning and weight quantization. In this paper, we propose an unified and systematic memristor-based framework considering both structured weight pruning and weight quantization by incorporating alternating direction method of multipliers (ADMM) into DNNs training. We consider hardware constraints such as crossbar blocks pruning, conductance range, and mismatch between weight value and real devices, to achieve high accuracy and low power and small area footprint. Our framework is mainly integrated by three steps, i.e., memristor-based ADMM regularized optimization, masked mapping and retraining. Experimental results show that our proposed framework achieves 29.81X (20.88X) weight compression ratio, with 98.38% (96.96%) and 98.29% (97.47%) power and area reduction on VGG-16 (ResNet-18) network where only have 0.5% (0.76%) accuracy loss, compared to the original DNN models. We share our models at link http://bit.ly/2Jp5LHJ.

التقنيات الناشئة هندسة العتاد التعلم الآلي

The Limiting Spectral Measure for Ensembles of Symmetric Block Circulant Matrices

135 - Murat Kologlu , Gene S. Kopp , Steven J. Miller 2010

Given an ensemble of NxN random matrices, a natural question to ask is whether or not the empirical spectral measures of typical matrices converge to a limiting spectral measure as N --> oo. While this has been proved for many thin patterned ensemble s sitting inside all real symmetric matrices, frequently there is no nice closed form expression for the limiting measure. Further, current theorems provide few pictures of transitions between ensembles. We consider the ensemble of symmetric m-block circulant matrices with entries i.i.d.r.v. These matrices have toroidal diagonals periodic of period m. We view m as a dial we can turn from the thin ensemble of symmetric circulant matrices, whose limiting eigenvalue density is a Gaussian, to all real symmetric matrices, whose limiting eigenvalue density is a semi-circle. The limiting eigenvalue densities f_m show a visually stunning convergence to the semi-circle as m tends to infinity, which we prove. In contrast to most studies of patterned matrix ensembles, our paper gives explicit closed form expressions for the densities. We prove that f_m is the product of a Gaussian and a degree 2m-2 polynomial; the formula equals that of the m x m Gaussian Unitary Ensemble (GUE). The proof is by the moments. The new feature, which allows us to obtain closed form expressions, is converting the central combinatorial problem in the moment calculation into an equivalent counting problem in algebraic topology. We end with a generalization of the m-block circulant pattern, dropping the assumption that the m random variables be distinct. We prove that the limiting spectral distribution exists and is determined by the pattern of the independent elements within an m-period, depending on not only the frequency at which each element appears, but also the way the elements are arranged.

الاحتمالات الفيزياء الرياضية الفيزياء الرياضية

Circulant q-Butson Hadamard matrices

94 - Trevor Hyde , Joseph Kraisler 2017

If $q = p^n$ is a prime power, then a $d$-dimensional emph{$q$-Butson Hadamard matrix} $H$ is a $dtimes d$ matrix with all entries $q$th roots of unity such that $HH^* = dI_d$. We use algebraic number theory to prove a strong constraint on the dimens ion of a circulant $q$-Butson Hadamard matrix when $d = p^m$ and then explicitly construct a family of examples in all possible dimensions. These results relate to the long-standing circulant Hadamard matrix conjecture in combinatorics.

التوافقية نظرية الأعداد

GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs

179 - Yuke Wang , Boyuan Feng , Gushu Li 2020

As the emerging trend of graph-based deep learning, Graph Neural Networks (GNNs) excel for their capability to generate high-quality node feature vectors (embeddings). However, the existing one-size-fits-all GNN implementations are insufficient to ca tch up with the evolving GNN architectures, the ever-increasing graph sizes, and the diverse node embedding dimensionalities. To this end, we propose textbf{GNNAdvisor}, an adaptive and efficient runtime system to accelerate various GNN workloads on GPU platforms. First, GNNAdvisor explores and identifies several performance-relevant features from both the GNN model and the input graph, and uses them as a new driving force for GNN acceleration. Second, GNNAdvisor implements a novel and highly-efficient 2D workload management, tailored for GNN computation to improve GPU utilization and performance under different application settings. Third, GNNAdvisor capitalizes on the GPU memory hierarchy for acceleration by gracefully coordinating the execution of GNNs according to the characteristics of the GPU memory structure and GNN workloads. Furthermore, to enable automatic runtime optimization, GNNAdvisor incorporates a lightweight analytical model for an effective design parameter search. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library ($3.02times$ faster on average) and NeuGraph (up to $4.10times$ faster), on mainstream GNN architectures across various datasets.

النظم الموزعة والتوازية والحوسبة العنقودية