The Curious Case of Convex Neural Networks

82 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sarath Sivaprasad

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Sarath Sivaprasad - Ankur Singh - Naresh Manwani

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we investigate a constrained formulation of neural networks where the output is a convex function of the input. We show that the convexity constraints can be enforced on both fully connected and convolutional layers, making them applicable to most architectures. The convexity constraints include restricting the weights (for all but the first layer) to be non-negative and using a non-decreasing convex activation function. Albeit simple, these constraints have profound implications on the generalization abilities of the network. We draw three valuable insights: (a) Input Output Convex Neural Networks (IOC-NNs) self regularize and reduce the problem of overfitting; (b) Although heavily constrained, they outperform the base multi layer perceptrons and achieve similar performance as compared to base convolutional architectures and (c) IOC-NNs show robustness to noise in train labels. We demonstrate the efficacy of the proposed idea using thorough experiments and ablation studies on standard image classification datasets with three different neural network architectures.

قيم البحث

123 - Vikas Raunak , Arul Menezes , Marcin Junczys-Dowmunt 2021

In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldm an (2020), and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations (detached and oscillatory outputs) could be generated and explained through specific corpus-level noise patterns. Finally, we elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

On Hiding Neural Networks Inside Neural Networks

99 - Chuan Guo , Ruihan Wu , Kilian Q. Weinberger 2020

Modern neural networks often contain significantly more parameters than the size of their training data. We show that this excess capacity provides an opportunity for embedding secret machine learning models within a trained neural network. Our novel framework hides the existence of a secret neural network with arbitrary desired functionality within a carrier network. We prove theoretically that the secret networks detection is computationally infeasible and demonstrate empirically that the carrier network does not compromise the secret networks disguise. Our paper introduces a previously unknown steganographic technique that can be exploited by adversaries if left unchecked.

التعلم الآلي التعلم الالي

Hyperbolic Neural Networks++

92 - Ryohei Shimizu , Yusuke Mukuta , Tatsuya Harada 2020

Hyperbolic spaces, which have the capacity to embed tree structures without distortion owing to their exponential volume growth, have recently been applied to machine learning to better capture the hierarchical nature of data. In this study, we gener alize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincare ball model. This novel methodology constructs a multinomial logistic regression, fully-connected layers, convolutional layers, and attention mechanisms under a unified mathematical interpretation, without increasing the parameters. Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.

التعلم الآلي التعلم الالي

Hypergraph Neural Networks

68 - Yifan Feng , Haoxuan You , Zizhao Zhang 2018

In this paper, we present a hypergraph neural networks (HGNN) framework for data representation learning, which can encode high-order data correlation in a hypergraph structure. Confronting the challenges of learning representation for complex data i n real practice, we propose to incorporate such data structure in a hypergraph, which is more flexible on data modeling, especially when dealing with complex data. In this method, a hyperedge convolution operation is designed to handle the data correlation during representation learning. In this way, traditional hypergraph learning procedure can be conducted using hyperedge convolution operations efficiently. HGNN is able to learn the hidden layer representation considering the high-order data structure, which is a general framework considering the complex data correlations. We have conducted experiments on citation network classification and visual object recognition tasks and compared HGNN with graph convolutional networks and other traditional methods. Experimental results demonstrate that the proposed HGNN method outperforms recent state-of-the-art methods. We can also reveal from the results that the proposed HGNN is superior when dealing with multi-modal data compared with existing methods.

التعلم الآلي التعلم الالي

The Curious Case of NGC6908

181 - B.F. Madore 2007

The object NGC6908 was once thought to be simply a surface-brightness enhancement in the eastern spiral arm of the nearby spiral galaxy NGC6907. Based on an examination of near-infrared imaging, the object is shown in fact to be a lenticular S0(6/7) galaxy hidden in the optical glare of the disk and spiral structure of the larger galaxy. New radial velocities of NGC6908 (3,060+/-16 (emission); 3,113+/-73 km/s (absorption)) have been obtained at the Baade 6.5m and the duPont 2.5m telescopes at Las Campanas, Chile placing NGC6908 at the same expansion-velocity distance as NGC6907 (3,190+/-5 km/s), eliminating the possibility of a purely chance line-of-sight coincidence. The once-enigmatic asymmetries in the disk and outer spiral structure of NGC6907 are now explained as being due to an advanced merger event. Newly discovered tails and debris in the outer reaches of this galaxy further support the merger scenario for this system. This pair of galaxies is a rather striking example of two objects discovered over 100 years ago, whose true nature was lost until modern detectors operating at infrared wavelengths gave us a new (high-contrast) look. Other examples of embedded merger remnants may also reveal themselves in the growing samples of near-infrared imaging of nearby galaxies; and a pilot study does reveal several other promising candidates for follow-up observations.