ترغب بنشر مسار تعليمي؟ اضغط هنا

Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs

71   0   0.0 ( 0 )
 نشر من قبل Matthew Leavitt
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The properties of individual neurons are often analyzed in order to understand the biological and artificial neural networks in which theyre embedded. Class selectivity-typically defined as how different a neurons responses are across different classes of stimuli or data samples-is commonly used for this purpose. However, it remains an open question whether it is necessary and/or sufficient for deep neural networks (DNNs) to learn class selectivity in individual units. We investigated the causal impact of class selectivity on network function by directly regularizing for or against class selectivity. Using this regularizer to reduce class selectivity across units in convolutional neural networks increased test accuracy by over 2% for ResNet18 trained on Tiny ImageNet. For ResNet20 trained on CIFAR10 we could reduce class selectivity by a factor of 2.5 with no impact on test accuracy, and reduce it nearly to zero with only a small ($sim$2%) drop in test accuracy. In contrast, regularizing to increase class selectivity significantly decreased test accuracy across all models and datasets. These results indicate that class selectivity in individual units is neither sufficient nor strictly necessary, and can even impair DNN performance. They also encourage caution when focusing on the properties of single units as representative of the mechanisms by which DNNs function.



قيم البحث

اقرأ أيضاً

130 - Sebastiano Vigna 2007
This note argues about the validity of web-graph data used in the literature.
Representational sparsity is known to affect robustness to input perturbations in deep neural networks (DNNs), but less is known about how the semantic content of representations affects robustness. Class selectivity-the variability of a units respon ses across data classes or dimensions-is one way of quantifying the sparsity of semantic representations. Given recent evidence that class selectivity may not be necessary for, and in some cases can impair generalization, we investigate whether it also confers robustness (or vulnerability) to perturbations of input data. We found that networks regularized to have lower levels of class selectivity were more robust to average-case (naturalistic) perturbations, while networks with higher class selectivity are more vulnerable. In contrast, class selectivity increases robustness to multiple types of worst-case (i.e. white box adversarial) perturbations, suggesting that while decreasing class selectivity is helpful for average-case perturbations, it is harmful for worst-case perturbations. To explain this difference, we studied the dimensionality of the networks representations: we found that the dimensionality of early-layer representations is inversely proportional to a networks class selectivity, and that adversarial samples cause a larger increase in early-layer dimensionality than corrupted samples. Furthermore, the input-unit gradient is more variable across samples and units in high-selectivity networks compared to low-selectivity networks. These results lead to the conclusion that units participate more consistently in low-selectivity regimes compared to high-selectivity regimes, effectively creating a larger attack surface and hence vulnerability to worst-case perturbations.
We demonstrate how broadband angular selectivity can be achieved with stacks of one-dimensionally periodic photonic crystals, each consisting of alternating isotropic layers and effective anisotropic layers, where each effective anisotropic layer is constructed from a multilayered metamaterial. We show that by simply changing the structure of the metamaterials, the selective angle can be tuned to a broad range of angles; and, by increasing the number of stacks, the angular transmission window can be made as narrow as desired. As a proof of principle, we realize the idea experimentally in the microwave regime. The angular selectivity and tunability we report here can have various applications such as in directional control of electromagnetic emitters and detectors.
Deep learning emerges as an important new resource-intensive workload and has been successfully applied in computer vision, speech, natural language processing, and so on. Distributed deep learning is becoming a necessity to cope with growing data an d model sizes. Its computation is typically characterized by a simple tensor data abstraction to model multi-dimensional matrices, a data-flow graph to model computation, and iterative executions with relatively frequent synchronizations, thereby making it substantially different from Map/Reduce style distributed big data computation. RPC, commonly used as the communication primitive, has been adopted by popular deep learning frameworks such as TensorFlow, which uses gRPC. We show that RPC is sub-optimal for distributed deep learning computation, especially on an RDMA-capable network. The tensor abstraction and data-flow graph, coupled with an RDMA network, offers the opportunity to reduce the unnecessary overhead (e.g., memory copy) without sacrificing programmability and generality. In particular, from a data access point of view, a remote machine is abstracted just as a device on an RDMA channel, with a simple memory interface for allocating, reading, and writing memory regions. Our graph analyzer looks at both the data flow graph and the tensors to optimize memory allocation and remote data access using this interface. The result is up to 25 times speedup in representative deep learning benchmarks against the standard gRPC in TensorFlow and up to 169% improvement even against an RPC implementation optimized for RDMA, leading to faster convergence in the training process.
We show that electron correlations lead to a bad metallic state in chalcogenides FeSe and FeTe despite the intermediate value of the Hubbard repulsion $U$ and Hunds rule coupling $J$. The evolution of the quasi particle weight $Z$ as a function of th e interaction terms reveals a clear crossover at $U simeq$ 2.5 eV. In the weak coupling limit $Z$ decreases for all correlated $d$ orbitals as a function of $U$ and beyond the crossover coupling they become weakly dependent on $U$ while strongly depend on $J$. A marked orbital dependence of the $Z$s emerges even if in general the orbital-selective Mott transition only occurs for relatively large values of $U$. This two-stage reduction of the quasi particle coherence due to the combined effect of Hubbard $U$ and the Hunds $J$, suggests that the iron-based superconductors can be referred to as Hunds correlated metals.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا