ترغب بنشر مسار تعليمي؟ اضغط هنا

137 - Zhuobin Liang , Xiao Zhang 2021
We prove the positive energy conjecture for a class of asymptotically Horowitz-Myers metrics on $mathbb{R}^{2}timesmathbb{T}^{n-2}$. This generalizes the previous results of Barzegar-Chru{s}ciel-H{o}rzinger-Maliborski-Nguyen as well as the authors.
The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of catastrophic interference and the collapse in performance as later training is likely to overwrite and interfer with previously learned policies. In this paper, we introduce the concept of context into single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.
Recently, the object detection based on deep learning has proven to be vulnerable to adversarial patch attacks. The attackers holding a specially crafted patch can hide themselves from the state-of-the-art person detectors, e.g., YOLO, even in the ph ysical world. This kind of attack can bring serious security threats, such as escaping from surveillance cameras. In this paper, we deeply explore the detection problems about the adversarial patch attacks to the object detection. First, we identify a leverageable signature of existing adversarial patches from the point of the visualization explanation. A fast signature-based defense method is proposed and demonstrated to be effective. Second, we design an improved patch generation algorithm to reveal the risk that the signature-based way may be bypassed by the techniques emerging in the future. The newly generated adversarial patches can successfully evade the proposed signature-based defense. Finally, we present a novel signature-independent detection method based on the internal content semantics consistency rather than any attack-specific prior knowledge. The fundamental intuition is that the adversarial object can appear locally but disappear globally in an input image. The experiments demonstrate that the signature-independent method can effectively detect the existing and improved attacks. It has also proven to be a general method by detecting unforeseen and even other types of attacks without any attack-specific prior knowledge. The two proposed detection methods can be adopted in different scenarios, and we believe that combining them can offer a comprehensive protection.
Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $mathcal{O}(epsilon^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $mathcal{O}(epsilon^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications. Our codes for MRBO, VRBO and other benchmarks are available $text{online}^1$.
143 - Kaiyi Ji , Yingbin Liang 2021
Bilevel optimization has recently attracted growing interests due to its wide applications in modern machine learning problems. Although recent studies have characterized the convergence rate for several such popular algorithms, it is still unclear h ow much further these convergence rates can be improved. In this paper, we address this fundamental question from two perspectives. First, we provide the first-known lower complexity bounds of $widetilde{Omega}(frac{1}{sqrt{mu_x}mu_y})$ and $widetilde Omegabig(frac{1}{sqrt{epsilon}}min{frac{1}{mu_y},frac{1}{sqrt{epsilon^{3}}}}big)$ respectively for strongly-convex-strongly-convex and convex-strongly-convex bilevel optimizations. Second, we propose an accelerated bilevel optimizer named AccBiO, for which we provide the first-known complexity bounds without the gradient boundedness assumption (which was made in existing analyses) under the two aforementioned geometries. We also provide significantly tighter upper bounds than the existing complexity when the bounded gradient assumption does hold. We show that AccBiO achieves the optimal results (i.e., the upper and lower bounds match up to logarithmic factors) when the inner-level problem takes a quadratic form with a constant-level condition number. Interestingly, our lower bounds under both geometries are larger than the corresponding optimal complexities of minimax optimization, establishing that bilevel optimization is provably more challenging than minimax optimization.
Masking of quantum information spreads it over nonlocal correlations and hides it from the subsystems. It is known that no operation can simultaneously mask all pure states [Phys. Rev. Lett. 120, 230501 (2018)], so in what sense is quantum informatio n masking useful? Here, we extend the definition of quantum information masking to general mixed states, and show that the resource of maskable quantum states are far more abundant than the no-go theorem seemingly suggests. Geometrically, the simultaneously maskable states lays on hyperdisks in the state hypersphere, and strictly contain the broadcastable states. We devise a photonic quantum information masking machine using time-correlated photons to experimentally investigate the properties of qubit masking, and demonstrate the transfer of quantum information into bipartite correlations and its faithful retrieval. The versatile masking machine has decent extensibility, and may be applicable to quantum secret sharing and fault-tolerant quantum communication. Our results provide some insights on the comprehension and potential application of quantum information masking.
Bilevel optimization has arisen as a powerful tool for many machine learning problems such as meta-learning, hyperparameter optimization, and reinforcement learning. In this paper, we investigate the nonconvex-strongly-convex bilevel optimization pro blem. For deterministic bilevel optimization, we provide a comprehensive convergence rate analysis for two popular algorithms respectively based on approximate implicit differentiation (AID) and iterative differentiation (ITD). For the AID-based method, we orderwisely improve the previous convergence rate analysis due to a more practical parameter selection as well as a warm start strategy, and for the ITD-based method we establish the first theoretical convergence rate. Our analysis also provides a quantitative comparison between ITD and AID based approaches. For stochastic bilevel optimization, we propose a novel algorithm named stocBiO, which features a sample-efficient hypergradient estimator using efficient Jacobian- and Hessian-vector product computations. We provide the convergence rate guarantee for stocBiO, and show that stocBiO outperforms the best known computational complexities orderwisely with respect to the condition number $kappa$ and the target accuracy $epsilon$. We further validate our theoretical results and demonstrate the efficiency of bilevel optimization algorithms by the experiments on meta-learning and hyperparameter optimization.
Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in~citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum by taking $tilde{Omega}left(left( frac{1}{(1-gamma)^6epsilon^2}right)^{frac{1}{omega}} +left(frac{1}{1-gamma}right)^{frac{1}{1-omega}}right)$ iterations, where $omegain(0,1)$ is the decay parameter of the learning rate, and $gamma$ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.
We revise the Non-Gaussianity of canonical curvaton scenario with a generalized $delta N$ formalism, in which it could handle the generic potentials. In various curvaton models, the energy density is dominant in different period including the seconda ry inflation of curvaton, matter domination and radiation domination. Our method could unify to deal with these periods since the non-linearity parameter $f_{rm NL}$ associated with Non-Gaussianity is a function of equation of state $w$. We firstly investigate the most simple curvaton scenario, namely the chaotic curvaton with quadratic potential. Our study shows that most parameter space satisfies with observational constraints. And our formula will nicely recover the well-known value of $f_{rm NL}$ in the absence of non-linear evolution. From the micro origin of curvaton, we also investigate the Pseudo-Nambu-Goldstone curvaton. Our result clearly indicates that the second short inflationary process for Pseudo-Nambu-Goldstone curvaton is ruled out in light of observations. Finally, our method sheds a new way for investigating the Non-Gaussianity of curvaton mechanism, espeically for exploring the Non-Gaussianity in MSSM curvaton model.
Generative adversarial imitation learning (GAIL) is a popular inverse reinforcement learning approach for jointly optimizing policy and reward from expert trajectories. A primary question about GAIL is whether applying a certain policy gradient algor ithm to GAIL attains a global minimizer (i.e., yields the expert policy), for which existing understanding is very limited. Such global convergence has been shown only for the linear (or linear-type) MDP and linear (or linearizable) reward. In this paper, we study GAIL under general MDP and for nonlinear reward function classes (as long as the objective function is strongly concave with respect to the reward parameter). We characterize the global convergence with a sublinear rate for a broad range of commonly used policy gradient algorithms, all of which are implemented in an alternating manner with stochastic gradient ascent for reward update, including projected policy gradient (PPG)-GAIL, Frank-Wolfe policy gradient (FWPG)-GAIL, trust region policy optimization (TRPO)-GAIL and natural policy gradient (NPG)-GAIL. This is the first systematic theoretical study of GAIL for global convergence.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا