ترغب بنشر مسار تعليمي؟ اضغط هنا

Kernel methods are fundamental in machine learning, and faster algorithms for kernel approximation provide direct speedups for many core tasks in machine learning. The polynomial kernel is especially important as other kernels can often be approximat ed by the polynomial kernel via a Taylor series expansion. Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic. However, for more slowly growing kernels, such as the neural tangent and arc-cosine kernels, $q$ needs to be polynomial, and previous work incurs a polynomial factor slowdown in the running time. We give a new oblivious sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term. Combined with a novel sampling scheme, we give the fastest algorithms for approximating a large family of slow-growing kernels.
The rapid development of chargeable devices has caused a great deal of interest in efficient and stable wireless power transfer (WPT) solutions. Most conventional WPT technologies exploit outdated electromagnetic field control methods proposed in the 20th century, wherein some essential parameters are sacrificed in favour of the other ones (efficiency vs. stability), making available WPT systems far from the optimal ones. Over the last few years, the development of novel approaches to electromagnetic field manipulation has enabled many up-and-coming technologies holding great promises for advanced WPT. Examples include coherent perfect absorption, exceptional points in non-Hermitian systems, non-radiating states and anapoles, advanced artificial materials and metastructures. This work overviews the recent achievements in novel physical effects and materials for advanced WPT. We provide a consistent analysis of existing technologies, their pros and cons, and attempt to envision possible perspectives.
We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions. We formulate the value function estimation procedure in value iteration as an approximate maximum inner pro duct search problem and propose a locality sensitive hashing (LSH) [Indyk and Motwani STOC98, Andoni and Razenshteyn STOC15, Andoni, Laarhoven, Razenshteyn and Waingarten SODA17] type data structure to solve this problem with sublinear time complexity. Moreover, we build the connections between the theory of approximate maximum inner product search and the regret analysis of reinforcement learning. We prove that, with our choice of approximation factor, our Sublinear LSVI algorithms maintain the same regret as the original LSVI algorithms while reducing the runtime complexity to sublinear in the number of actions. To the best of our knowledge, this is the first work that combines LSH with reinforcement learning resulting in provable improvements. We hope that our novel way of combining data-structures and iterative algorithm will open the door for further study into cost reduction in optimization.
Federated Learning (FL) is an emerging learning scheme that allows different distributed clients to train deep neural networks together without data sharing. Neural networks have become popular due to their unprecedented success. To the best of our k nowledge, the theoretical guarantees of FL concerning neural networks with explicit forms and multi-step updates are unexplored. Nevertheless, training analysis of neural networks in FL is non-trivial for two reasons: first, the objective loss function we are optimizing is non-smooth and non-convex, and second, we are even not updating in the gradient direction. Existing convergence results for gradient descent-based methods heavily rely on the fact that the gradient direction is used for updating. This paper presents a new class of convergence analysis for FL, Federated Learning Neural Tangent Kernel (FL-NTK), which corresponds to overparamterized ReLU neural networks trained by gradient descent in FL and is inspired by the analysis in Neural Tangent Kernel (NTK). Theoretically, FL-NTK converges to a global-optimal solution at a linear rate with properly tuned learning parameters. Furthermore, with proper distributional assumptions, FL-NTK can also achieve good generalization.
In this work we examine the security of InstaHide, a recently proposed scheme for distributed learning (Huang et al.). A number of recent works have given reconstruction attacks for InstaHide in various regimes by leveraging an intriguing connection to the following matrix factorization problem: given the Gram matrix of a collection of m random k-sparse Boolean vectors in {0,1}^r, recover the vectors (up to the trivial symmetries). Equivalently, this can be thought of as a sparse, symmetric variant of the well-studied problem of Boolean factor analysis, or as an average-case version of the classic problem of recovering a k-uniform hypergraph from its line graph. As previous algorithms either required m to be exponentially large in k or only applied to k = 2, they left open the question of whether InstaHide possesses some form of fine-grained security against reconstruction attacks for moderately large k. In this work, we answer this in the negative by giving a simple O(m^{omega + 1}) time algorithm for the above matrix factorization problem. Our algorithm, based on tensor decomposition, only requires m to be at least quasi-linear in r. We complement this result with a quasipolynomial-time algorithm for a worst-case setting of the problem where the collection of k-sparse vectors is chosen arbitrarily.
This paper introduces a new interior point method algorithm that solves semidefinite programming (SDP) with variable size $n times n$ and $m$ constraints in the (current) matrix multiplication time $m^{omega}$ when $m geq Omega(n^2)$. Our algorithm i s optimal because even finding a feasible matrix that satisfies all the constraints requires solving an linear system in $m^{omega}$ time. Our work improves the state-of-the-art SDP solver [Jiang, Kathuria, Lee, Padmanabhan and Song, FOCS 2020], and it is the first result that SDP can be solved in the optimal running time. Our algorithm is based on two novel techniques: $bullet$ Maintaining the inverse of a Kronecker product using lazy updates. $bullet$ A general amortization scheme for positive semidefinite matrices.
Inspired by InstaHide challenge [Huang, Song, Li and Arora20], [Chen, Song and Zhuo20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution. They show that it suffices to use $O(n_{mathsf{priv} }^{k_{mathsf{priv}} - 2/(k_{mathsf{priv}} + 1)})$ samples to recover one private image in $n_{mathsf{priv}}^{O(k_{mathsf{priv}})} + mathrm{poly}(n_{mathsf{pub}})$ time for any integer $k_{mathsf{priv}}$, where $n_{mathsf{priv}}$ and $n_{mathsf{pub}}$ denote the number of images used in the private and the public dataset to generate a mixed image sample. Under the current setup for the InstaHide challenge of mixing two private images ($k_{mathsf{priv}} = 2$), this means $n_{mathsf{priv}}^{4/3}$ samples are sufficient to recover a private image. In this work, we show that $n_{mathsf{priv}} log ( n_{mathsf{priv}} )$ samples are sufficient (information-theoretically) for recovering all the private images.
An unsolved challenge in distributed or federated learning is to effectively mitigate privacy risks without slowing down training or reducing accuracy. In this paper, we propose TextHide aiming at addressing this challenge for natural language unders tanding tasks. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. Such an encryption step is efficient and only affects the task performance slightly. In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e.g., BERT) for any sentence or sentence-pair task. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations and the averaged accuracy reduction is only $1.9%$. We also present an analysis of the security of TextHide using a conjecture about the computational intractability of a mathematical problem. Our code is available at https://github.com/Hazelsuko07/TextHide
How can multiple distributed entities collaboratively train a shared deep net on their private data while preserving privacy? This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines. The encryption is efficient and applying it during training has minor effect on test accuracy. InstaHide encrypts each training image with a one-time secret key which consists of mixing a number of randomly chosen images and applying a random pixel-wise mask. Other contributions of this paper include: (a) Using a large public dataset (e.g. ImageNet) for mixing during its encryption, which improves security. (b) Experimental results to show effectiveness in preserving privacy against known attacks with only minor effects on accuracy. (c) Theoretical analysis showing that successfully attacking privacy requires attackers to solve a difficult computational problem. (d) Demonstrating that use of the pixel-wise mask is important for security, since Mixup alone is shown to be insecure to some some efficient attacks. (e) Release of a challenge dataset https://github.com/Hazelsuko07/InstaHide_Challenge Our code is available at https://github.com/Hazelsuko07/InstaHide
The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $mathit{second}$-$mathit{order}$ optimization algorithms beyond SGD, with out compromising the generalization error. Despite their remarkable convergence rate ($mathit{independent}$ of the training batch size $n$), second-order algorithms incur a daunting slowdown in the $mathit{cost}$ $mathit{per}$ $mathit{iteration}$ (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19,CGH+19}, yielding an $O(mn^2)$-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width $m$. We show how to speed up the algorithm of [CGH+19], achieving an $tilde{O}(mn)$-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension ($mn$) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an $ell_2$-regression problem, and then use a Fast-JL type dimension reduction to $mathit{precondition}$ the underlying Gram matrix in time independent of $M$, allowing to find a sufficiently good approximate solution via $mathit{first}$-$mathit{order}$ conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra -- which led to recent breakthroughs in $mathit{convex}$ $mathit{optimization}$ (ERM, LPs, Regression) -- can be carried over to the realm of deep learning as well.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا