On Characterizing the Capacity of Neural Networks using Algebraic Topology

63 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل William Guss

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف William H. Guss - Ruslan Salakhutdinov

التعلم الآلي الهندسة الحسابية الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The learnability of different neural architectures can be characterized directly by computable measures of data complexity. In this paper, we reframe the problem of architecture selection as understanding how data determines the most expressive and generalizable architectures suited to that data, beyond inductive bias. After suggesting algebraic topology as a measure for data complexity, we show that the power of a network to express the topological complexity of a dataset in its decision region is a strictly limiting factor in its ability to generalize. We then provide the first empirical characterization of the topological capacity of neural networks. Our empirical analysis shows that at every level of dataset complexity, neural networks exhibit topological phase transitions. This observation allowed us to connect existing theory to empirically driven conjectures on the choice of architectures for fully-connected neural networks.

قيم البحث

435 - Behnam Neyshabur , Ryota Tomioka , Nathan Srebro 2015

We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

التعلم الآلي الذكاء الاصطناعي الحوسبة العصبية والتطورية

Learning Algebraic Multigrid Using Graph Neural Networks

115 - Ilay Luz , Meirav Galun , Haggai Maron 2020

Efficient numerical solvers for sparse linear systems are crucial in science and engineering. One of the fastest methods for solving large-scale sparse linear systems is algebraic multigrid (AMG). The main challenge in the construction of AMG algorit hms is the selection of the prolongation operator -- a problem-dependent sparse matrix which governs the multiscale hierarchy of the solver and is critical to its efficiency. Over many years, numerous methods have been developed for this task, and yet there is no known single right answer except in very special cases. Here we propose a framework for learning AMG prolongation operators for linear systems with sparse symmetric positive (semi-) definite matrices. We train a single graph neural network to learn a mapping from an entire class of such matrices to prolongation operators, using an efficient unsupervised loss function. Experiments on a broad class of problems demonstrate improved convergence rates compared to classical AMG, demonstrating the potential utility of neural networks for developing sparse system solvers.

التعلم الآلي التعلم الالي

Progressive Weight Pruning of Deep Neural Networks using ADMM

134 - Shaokai Ye , Tianyun Zhang , Kaiqi Zhang 2018

Deep neural networks (DNNs) although achieving human-level performance in many domains, have very large model size that hinders their broader applications on edge computing devices. Extensive research work have been conducted on DNN model compression or pruning. However, most of the previous work took heuristic approaches. This work proposes a progressive weight pruning approach based on ADMM (Alternating Direction Method of Multipliers), a powerful technique to deal with non-convex optimization problems with potentially combinatorial constraints. Motivated by dynamic programming, the proposed method reaches extremely high pruning rate by using partial prunings with moderate pruning rates. Therefore, it resolves the accuracy degradation and long convergence time problems when pursuing extremely high pruning ratios. It achieves up to 34 times pruning rate for ImageNet dataset and 167 times pruning rate for MNIST dataset, significantly higher than those reached by the literature work. Under the same number of epochs, the proposed method also achieves faster convergence and higher compression rates. The codes and pruned DNN models are released in the link bit.ly/2zxdlss

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية

Topological Detection of Trojaned Neural Networks

107 - Songzhu Zheng , Yikai Zhang , Hubert Wagner 2021

Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the models behavior through Trojaned training samples, which can later be exploited. Guided by ba sic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from input to output layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.

التعلم الآلي الهندسة الحسابية

Design of Capacity-Approaching Low-Density Parity-Check Codes using Recurrent Neural Networks

52 - Eleni Nisioti , Nikolaos Thomos 2020

In this paper, we model Density Evolution (DE) using Recurrent Neural Networks (RNNs) with the aim of designing capacity-approaching Irregular Low-Density Parity-Check (LDPC) codes for binary erasure channels. In particular, we present a method for d etermining the coefficients of the degree distributions, characterizing the structure of an LDPC code. We refer to our RNN architecture as Neural Density Evolution (NDE) and determine the weights of the RNN that correspond to optimal designs by minimizing a loss function that enforces the properties of asymptotically optimal design, as well as the desired structural characteristics of the code. This renders the LDPC design process highly configurable, as constraints can be added to meet applications requirements by means of modifying the loss function. In order to train the RNN, we generate data corresponding to the expected channel noise. We analyze the complexity and optimality of NDE theoretically, and compare it with traditional design methods that employ differential evolution. Simulations illustrate that NDE improves upon differential evolution both in terms of asymptotic performance and complexity. Although we focus on asymptotic settings, we evaluate designs found by NDE for finite codeword lengths and observe that performance remains satisfactory across a variety of channels.

التعلم الآلي التعلم الالي