No Arabic abstract
Various factorization-based methods have been proposed to leverage second-order, or higher-order cross features for boosting the performance of predictive models. They generally enumerate all the cross features under a predefined maximum order, and then identify useful feature interactions through model training, which suffer from two drawbacks. First, they have to make a trade-off between the expressiveness of higher-order cross features and the computational cost, resulting in suboptimal predictions. Second, enumerating all the cross features, including irrelevant ones, may introduce noisy feature combinations that degrade model performance. In this work, we propose the Adaptive Factorization Network (AFN), a new model that learns arbitrary-order cross features adaptively from data. The core of AFN is a logarithmic transformation layer to convert the power of each feature in a feature combination into the coefficient to be learned. The experimental results on four real datasets demonstrate the superior predictive performance of AFN against the start-of-the-arts.
Factorization machine (FM) is a prevalent approach to modeling pairwise (second-order) feature interactions when dealing with high-dimensional sparse data. However, on the one hand, FM fails to capture higher-order feature interactions suffering from combinatorial expansion, on the other hand, taking into account interaction between every pair of features may introduce noise and degrade prediction accuracy. To solve the problems, we propose a novel approach Graph Factorization Machine (GraphFM) by naturally representing features in the graph structure. In particular, a novel mechanism is designed to select the beneficial feature interactions and formulate them as edges between features. Then our proposed model which integrates the interaction function of FM into the feature aggregation strategy of Graph Neural Network (GNN), can model arbitrary-order feature interactions on the graph-structured features by stacking layers. Experimental results on several real-world datasets has demonstrated the rationality and effectiveness of our proposed approach.
Good quality similarity metrics can significantly facilitate the performance of many large-scale, real-world applications. Existing studies have proposed various solutions to learn a Mahalanobis or bilinear metric in an online fashion by either restricting distances between similar (dissimilar) pairs to be smaller (larger) than a given lower (upper) bound or requiring similar instances to be separated from dissimilar instances with a given margin. However, these linear metrics learned by leveraging fixed bounds or margins may not perform well in real-world applications, especially when data distributions are complex. We aim to address the open challenge of Online Adaptive Metric Learning (OAML) for learning adaptive metric functions on the fly. Unlike traditional online metric learning methods, OAML is significantly more challenging since the learned metric could be non-linear and the model has to be self-adaptive as more instances are observed. In this paper, we present a new online metric learning framework that attempts to tackle the challenge by learning an ANN-based metric with adaptive model complexity from a stream of constraints. In particular, we propose a novel Adaptive-Bound Triplet Loss (ABTL) to effectively utilize the input constraints and present a novel Adaptive Hedge Update (AHU) method for online updating the model parameters. We empirically validate the effectiveness and efficacy of our framework on various applications such as real-world image classification, facial verification, and image retrieval.
We present the ADaptive Adversarial Imitation Learning (ADAIL) algorithm for learning adaptive policies that can be transferred between environments of varying dynamics, by imitating a small number of demonstrations collected from a single source domain. This is an important problem in robotic learning because in real world scenarios 1) reward functions are hard to obtain, 2) learned policies from one domain are difficult to deploy in another due to varying source to target domain statistics, 3) collecting expert demonstrations in multiple environments where the dynamics are known and controlled is often infeasible. We address these constraints by building upon recent advances in adversarial imitation learning; we condition our policy on a learned dynamics embedding and we employ a domain-adversarial loss to learn a dynamics-invariant discriminator. The effectiveness of our method is demonstrated on simulated control tasks with varying environment dynamics and the learned adaptive agent outperforms several recent baselines.
We build a theoretical framework for designing and understanding practical meta-learning methods that integrates sophisticated formalizations of task-similarity with the extensive literature on online convex optimization and sequential prediction algorithms. Our approach enables the task-similarity to be learned adaptively, provides sharper transfer-risk bounds in the setting of statistical learning-to-learn, and leads to straightforward derivations of average-case regret bounds for efficient algorithms in settings where the task-environment changes dynamically or the tasks share a certain geometric structure. We use our theory to modify several popular meta-learning algorithms and improve their meta-test-time performance on standard problems in few-shot learning and federated learning.
In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+delta_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy. We categorize such attacks by the infinity-norm constraint on $delta_t$: We provide a lower threshold below which reward-poisoning attack is infeasible and RL is certified to be safe; we provide a corresponding upper threshold above which the attack is feasible. Feasible attacks can be further categorized as non-adaptive where $delta_t$ depends only on $(s_t,a_t, s_{t+1})$, or adaptive where $delta_t$ depends further on the RL agents learning process at time $t$. Non-adaptive attacks have been the focus of prior works. However, we show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$, whereas non-adaptive attacks require exponential steps. We provide a constructive proof that a Fast Adaptive Attack strategy achieves the polynomial rate. Finally, we show that empirically an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.