Efficient Inference via Universal LSH Kernel

93 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zichang Liu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zichang Liu - Benjamin Coleman - Anshumali Shrivastava

التعلم الآلي الذكاء الاصطناعي بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Large machine learning models achieve unprecedented performance on various tasks and have evolved as the go-to technique. However, deploying these compute and memory hungry models on resource constraint environments poses new challenges. In this work, we propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations. Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name, providing a generic fundamental alternative to the problem of efficient inference that goes beyond the popular approach such as quantization, iterative pruning and knowledge distillation. A neural network function is transformed to its weighted kernel density representation, which can be very efficiently estimated with our sketching algorithm. Empirically, we show that Representer Sketch achieves up to 114x reduction in storage requirement and 59x reduction in computation complexity without any drop in accuracy.

قيم البحث

314 - Xinjie Zhang , Jiawei Shao , Yuyi Mao 2021

Device-edge co-inference, which partitions a deep neural network between a resource-constrained mobile device and an edge server, recently emerges as a promising paradigm to support intelligent mobile applications. To accelerate the inference process , on-device model sparsification and intermediate feature compression are regarded as two prominent techniques. However, as the on-device model sparsity level and intermediate feature compression ratio have direct impacts on computation workload and communication overhead respectively, and both of them affect the inference accuracy, finding the optimal values of these hyper-parameters brings a major challenge due to the large search space. In this paper, we endeavor to develop an efficient algorithm to determine these hyper-parameters. By selecting a suitable model split point and a pair of encoder/decoder for the intermediate feature vector, this problem is casted as a sequential decision problem, for which, a novel automated machine learning (AutoML) framework is proposed based on deep reinforcement learning (DRL). Experiment results on an image classification task demonstrate the effectiveness of the proposed framework in achieving a better communication-computation trade-off and significant inference speedup against various baseline schemes.

التعلم الآلي الذكاء الاصطناعي معالجة الإشارات

75 - Zhao Kang , Yiwei Lu , Yuanzhang Su 2019

Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been devel oped and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

LCS Graph Kernel Based on Wasserstein Distance in Longest Common Subsequence Metric Space

52 - Jianming Huang , Zhongxi Fang , Hiroyuki Kasai 2020

For graph classification tasks, many methods use a common strategy to aggregate information of vertex neighbors. Although this strategy provides an efficient means of extracting graph topological features, it brings excessive amounts of information t hat might greatly reduce its accuracy when dealing with large-scale neighborhoods. Learning graphs using paths or walks will not suffer from this difficulty, but many have low utilization of each path or walk, which might engender information loss and high computational costs. To solve this, we propose a graph kernel using a longest common subsequence (LCS kernel) to compute more comprehensive similarity between paths and walks, which resolves substructure isomorphism difficulties. We also combine it with optimal transport theory to extract more in-depth features of graphs. Furthermore, we propose an LCS metric space and apply an adjacent point merge operation to reduce its computational costs. Finally, we demonstrate that our proposed method outperforms many state-of-the-art graph kernel methods.

التعلم الآلي الذكاء الاصطناعي بنى وهياكل البيانات والخوارزميات

Efficient Algorithms for Outlier-Robust Regression

82 - Adam Klivans , Pravesh K. Kothari , Raghu Meka 2018

We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.

التعلم الآلي الذكاء الاصطناعي بنى وهياكل البيانات والخوارزميات

Robust Identifiability in Linear Structural Equation Models of Causal Inference

81 - Karthik Abinav Sankararaman , Anand Louis , Navin Goyal 2020

In this work, we consider the problem of robust parameter estimation from observational data in the context of linear structural equation models (LSEMs). LSEMs are a popular and well-studied class of models for inferring causality in the natural and social sciences. One of the main problems related to LSEMs is to recover the model parameters from the observational data. Under various conditions on LSEMs and the model parameters the prior work provides efficient algorithms to recover the parameters. However, these results are often about generic identifiability. In practice, generic identifiability is not sufficient and we need robust identifiability: small changes in the observational data should not affect the parameters by a huge amount. Robust identifiability has received far less attention and remains poorly understood. Sankararaman et al. (2019) recently provided a set of sufficient conditions on parameters under which robust identifiability is feasible. However, a limitation of their work is that their results only apply to a small sub-class of LSEMs, called ``bow-free paths. In this work, we significantly extend their work along multiple dimensions. First, for a large and well-studied class of LSEMs, namely ``bow free models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work. We then show that this sufficient condition holds with high probability which implies that for a large set of parameters robust identifiability holds and that for such parameters, existing algorithms already achieve robust identifiability. Finally, we validate our results on both simulated and real-world datasets.

التعلم الآلي الذكاء الاصطناعي بنى وهياكل البيانات والخوارزميات