أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Mo Zhou

Adversarial Attack and Defense in Deep Ranking

263 - Mo Zhou , Le Wang , Zhenxing Niu 2021

Deep Neural Network classifiers are vulnerable to adversarial attack, where an imperceptible perturbation could result in misclassification. However, the vulnerability of DNN-based image ranking systems remains under-explored. In this paper, we propo se two attacks against deep ranking systems, i.e., Candidate Attack and Query Attack, that can raise or lower the rank of chosen candidates by adversarial perturbations. Specifically, the expected ranking order is first represented as a set of inequalities, and then a triplet-like objective function is designed to obtain the optimal perturbation. Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks, where the model learns to prevent the positive and negative samples being pulled close to each other by adversarial attack. To comprehensively measure the empirical adversarial robustness of a ranking model with our defense, we propose an empirical robustness score, which involves a set of representative attacks against ranking models. Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets. Experimental results demonstrate that a typical deep ranking system can be effectively compromised by our attacks. Nevertheless, our defense can significantly improve the ranking system robustness, and simultaneously mitigate a wide range of attacks.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks

106 - Mo Zhou , Jiequn Han , Jianfeng Lu 2021

We propose a novel numerical method for high dimensional Hamilton--Jacobi--Bellman (HJB) type elliptic partial differential equations (PDEs). The HJB PDEs, reformulated as optimal control problems, are tackled by the actor-critic framework inspired b y reinforcement learning, based on neural network parametrization of the value and control functions. Within the actor-critic framework, we employ a policy gradient approach to improve the control, while for the value function, we derive a variance reduced least square temporal difference method (VR-LSTD) using stochastic calculus. To numerically discretize the stochastic control problem, we employ an adaptive stepsize scheme to improve the accuracy near the domain boundary. Numerical examples up to $20$ spatial dimensions including the linear quadratic regulators, the stochastic Van der Pol oscillators, and the diffusive Eikonal equations are presented to validate the effectiveness of our proposed method.

التحسين والتحكم التحليل العددي التحليل العددي

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

336 - Mo Zhou , Rong Ge , Chi Jin 2021

While over-parameterization is widely believed to be crucial for the success of optimization for the neural networks, most existing theories on over-parameterization do not fully explain the reason -- they either work in the Neural Tangent Kernel reg ime where neurons dont move much, or require an enormous number of neurons. In practice, when the data is generated using a teacher neural network, even mildly over-parameterized neural networks can achieve 0 loss and recover the directions of teacher neurons. In this paper we develop a local convergence theory for mildly over-parameterized two-layer neural net. We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parameterized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0. Our result holds for any number of student neurons as long as it is at least as large as the number of teacher neurons, and our convergence rate is independent of the number of student neurons. A key component of our analysis is the new characterization of local optimization landscape -- we show the gradient satisfies a special case of Lojasiewicz property which is different from local strong convexity or PL conditions used in previous work.

التعلم الآلي التعلم الالي

Ladder Loss for Coherent Visual-Semantic Embedding

70 - Mo Zhou , Zhenxing Niu , Le Wang 2019

For visual-semantic embedding, the existing methods normally treat the relevance between queries and candidates in a bipolar way -- relevant or irrelevant, and all irrelevant candidates are uniformly pushed away from the query by an equal margin in t he embedding space, regardless of their various proximity to the query. This practice disregards relatively discriminative information and could lead to suboptimal ranking in the retrieval results and poorer user experience, especially in the long-tail query scenario where a matching candidate may not necessarily exist. In this paper, we introduce a continuous variable to model the relevance degree between queries and multiple candidates, and propose to learn a coherent embedding space, where candidates with higher relevance degrees are mapped closer to the query than those with lower relevance degrees. In particular, the new ladder loss is proposed by extending the triplet loss inequality to a more general inequality chain, which implements variable push-away margins according to respective relevance degrees. In addition, a proper Coherent Score metric is proposed to better measure the ranking results including those irrelevant candidates. Extensive experiments on multiple datasets validate the efficacy of our proposed method, which achieves significant improvement over existing state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط

Towards Understanding the Importance of Shortcut Connections in Residual Networks

93 - Tianyi Liu , Minshuo Chen , Mo Zhou 2019

Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reaso n behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.

التعلم الآلي التحسين والتحكم التعلم الالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد