Leveraging Sparse Linear Layers for Debuggable Deep Networks

92 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Eric Wong

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Eric Wong - Shibani Santurkar - Aleksander Mk{a}dry

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantiatively via numerical and human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks. The code for our toolkit can be found at https://github.com/madrylab/debuggabledeepnetworks.

قيم البحث

92 - Junchen Yang , Ofir Lindenbaum , Yuval Kluger 2021

Despite the enormous success of neural networks, they are still hard to interpret and often overfit when applied to low-sample-size (LSS) datasets. To tackle these obstacles, we propose a framework for training locally sparse neural networks where th e local sparsity is learned via a sample-specific gating mechanism that identifies the subset of most relevant features for each measurement. The sample-specific sparsity is predicted via a textit{gating} network, which is trained in tandem with the textit{prediction} network. By learning these subsets and weights of a prediction model, we obtain an interpretable neural network that can handle LSS data and can remove nuisance variables, which are irrelevant for the supervised learning task. Using both synthetic and real-world datasets, we demonstrate that our method outperforms state-of-the-art models when predicting the target function with far fewer features per instance.

التعلم الآلي التعلم الالي

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

66 - Junjie Liu , Zhe Xu , Runbin Shi 2020

We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These thresholds can ha ve fine-grained layer-wise adjustments dynamically via backpropagation. We demonstrate that our dynamic sparse training algorithm can easily train very sparse neural network models with little performance loss using the same number of training epochs as dense models. Dynamic Sparse Training achieves the state of the art performance compared with other sparse training algorithms on various network architectures. Additionally, we have several surprising observations that provide strong evidence for the effectiveness and efficiency of our algorithm. These observations reveal the underlying problems of traditional three-stage pruning algorithms and present the potential guidance provided by our algorithm to the design of more compact network architectures.

التعلم الآلي التعلم الالي

Scalable Deep Generative Modeling for Sparse Graphs

98 - Hanjun Dai , Azade Nazi , Yujia Li 2020

Learning graph generative models is a challenging task for deep learning and has wide applicability to a range of domains like chemistry, biology and social science. However current deep neural methods suffer from limited scalability: for a graph wit h $n$ nodes and $m$ edges, existing deep neural methods require $Omega(n^2)$ complexity by building up the adjacency matrix. On the other hand, many real world graphs are actually sparse in the sense that $mll n^2$. Based on this, we develop a novel autoregressive model, named BiGG, that utilizes this sparsity to avoid generating the full adjacency matrix, and importantly reduces the graph generation time complexity to $O((n + m)log n)$. Furthermore, during training this autoregressive model can be parallelized with $O(log n)$ synchronization stages, which makes it much more efficient than other autoregressive models that require $Omega(n)$. Experiments on several benchmarks show that the proposed approach not only scales to orders of magnitude larger graphs than previously possible with deep autoregressive graph generative models, but also yields better graph generation quality.

التعلم الآلي التعلم الالي

Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units

105 - Yixi Xu , Xiao Wang 2018

This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p,q}$ normalization where $qle p^*$, and $1/p+1/p^{*}=1$, we discuss properties of a width-independent capacity control, which only depends on depth by a square root term. We further analyze the approximation properties of $L_{p,q}$ weight normalized deep neural networks. In particular, for an $L_{1,infty}$ weight normalized network, the approximation error can be controlled by the $L_1$ norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.

التعلم الآلي التعلم الالي

Mixture of Linear Models Co-supervised by Deep Neural Networks

106 - Beomseok Seo , Lin Lin , 2021

Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the comple xity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true for every application, in some applications, especially in economics, social science, healthcare industry, and administrative decision making, scientists or practitioners are resistant to use predictions made by a black-box system for multiple reasons. One reason is that a major purpose of a study can be to make discoveries based upon the prediction function, e.g., to reveal the relationships between measurements. Another reason can be that the training dataset is not large enough to make researchers feel completely sure about a purely data-driven result. Being able to examine and interpret the prediction function will enable researchers to connect the result with existing knowledge or gain insights about new directions to explore. Although classic statistical models are much more explainable, their accuracy often falls considerably below DNN. In this paper, we propose an approach to fill the gap between relatively simple explainable models and DNN such that we can more flexibly tune the trade-off between interpretability and accuracy. Our main idea is a mixture of discriminative models that is trained with the guidance from a DNN. Although mixtures of discriminative models have been studied before, our way of generating the mixture is quite different.

التعلم الآلي التعلم الالي