No Arabic abstract
Graph neural networks (GNNs), which learn the node representations by recursively aggregating information from its neighbors, have become a predominant computational tool in many domains. To handle large-scale graphs, most of the existing methods partition the input graph into multiple sub-graphs (e.g., through node clustering) and apply batch training to save memory cost. However, such batch training will lead to label bias within each batch, and then result in over-confidence in model predictions. Since the connected nodes with positively related labels tend to be assigned together, the traditional cross-entropy minimization process will attend on the predictions of biased classes in the batch, and may intensify the overfitting issue. To overcome the label bias problem, we propose the adaptive label smoothing (ALS) method to replace the one-hot hard labels with smoothed ones, which learns to allocate label confidences from the biased classes to the others. Specifically, ALS propagates node labels to aggregate the neighborhood label distribution in a pre-processing step, and then updates the optimal smoothed labels online to adapt to specific graph structure. Experiments on the real-world datasets demonstrate that ALS can be generally applied to the main scalable learning frameworks to calibrate the biased labels and improve generalization performances.
Label Smoothing (LS) is an effective regularizer to improve the generalization of state-of-the-art deep models. For each training sample the LS strategy smooths the one-hot encoded training signal by distributing its distribution mass over the non ground-truth classes, aiming to penalize the networks from generating overconfident output distributions. This paper introduces a novel label smoothing technique called Pairwise Label Smoothing (PLS). The PLS takes a pair of samples as input. Smoothing with a pair of ground-truth labels enables the PLS to preserve the relative distance between the two truth labels while further soften that between the truth labels and the other targets, resulting in models producing much less confident predictions than the LS strategy. Also, unlike current LS methods, which typically require to find a global smoothing distribution mass through cross-validation search, PLS automatically learns the distribution mass for each input pair during training. We empirically show that PLS significantly outperforms LS and the baseline models, achieving up to 30% of relative classification error reduction. We also visually show that when achieving such accuracy gains the PLS tends to produce very low winning softmax scores.
The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on large-scale graphs (e.g., GraphSAGE, ClusterGCN, etc.), we pioneer efficient training of large-scale GCN models (i.e., ultra-wide, overparameterized models) with the proposal of a novel, distributed training framework. Our proposed training methodology, called GIST, disjointly partitions the parameters of a GCN model into several, smaller sub-GCNs that are trained independently and in parallel. In addition to being compatible with any GCN architecture, GIST improves model performance, scales to training on arbitrarily large graphs, significantly decreases wall-clock training time, and enables the training of markedly overparameterized GCN models. Remarkably, with GIST, we train an astonishgly-wide 32,768-dimensional GraphSAGE model, which exceeds the capacity of a single GPU by a factor of 8X, to SOTA performance on the Amazon2M dataset.
Neural dialogue generation models trained with the one-hot target distribution suffer from the over-confidence issue, which leads to poor generation diversity as widely reported in the literature. Although existing approaches such as label smoothing can alleviate this issue, they fail to adapt to diverse dialog contexts. In this paper, we propose an Adaptive Label Smoothing (AdaLabel) approach that can adaptively estimate a target label distribution at each time step for different contexts. The maximum probability in the predicted distribution is used to modify the soft target distribution produced by a novel light-weight bi-directional decoder module. The resulting target distribution is aware of both previous and future contexts and is adjusted to avoid over-training the dialogue model. Our model can be trained in an end-to-end manner. Extensive experiments on two benchmark datasets show that our approach outperforms various competitive baselines in producing diverse responses.
Recent studies have shown that graph neural networks (GNNs) are vulnerable against perturbations due to lack of robustness and can therefore be easily fooled. Currently, most works on attacking GNNs are mainly using gradient information to guide the attack and achieve outstanding performance. However, the high complexity of time and space makes them unmanageable for large scale graphs and becomes the major bottleneck that prevents the practical usage. We argue that the main reason is that they have to use the whole graph for attacks, resulting in the increasing time and space complexity as the data scale grows. In this work, we propose an efficient Simplified Gradient-based Attack (SGA) method to bridge this gap. SGA can cause the GNNs to misclassify specific target nodes through a multi-stage attack framework, which needs only a much smaller subgraph. In addition, we present a practical metric named Degree Assortativity Change (DAC) to measure the impacts of adversarial attacks on graph data. We evaluate our attack method on four real-world graph networks by attacking several commonly used GNNs. The experimental results demonstrate that SGA can achieve significant time and memory efficiency improvements while maintaining competitive attack performance compared to state-of-art attack techniques. Codes are available via: https://github.com/EdisonLeeeee/SGAttack.
This paper describes an end-to-end solution for the relationship prediction task in heterogeneous, multi-relational graphs. We particularly address two building blocks in the pipeline, namely heterogeneous graph representation learning and negative sampling. Existing message passing-based graph neural networks use edges either for graph traversal and/or selection of message encoding functions. Ignoring the edge semantics could have severe repercussions on the quality of embeddings, especially when dealing with two nodes having multiple relations. Furthermore, the expressivity of the learned representation depends on the quality of negative samples used during training. Although existing hard negative sampling techniques can identify challenging negative relationships for optimization, new techniques are required to control false negatives during training as false negatives could corrupt the learning process. To address these issues, first, we propose RelGNN -- a message passing-based heterogeneous graph attention model. In particular, RelGNN generates the states of different relations and leverages them along with the node states to weigh the messages. RelGNN also adopts a self-attention mechanism to balance the importance of attribute features and topological features for generating the final entity embeddings. Second, we introduce a parameter-free negative sampling technique -- adaptive self-adversarial (ASA) negative sampling. ASA reduces the false-negative rate by leveraging positive relationships to effectively guide the identification of true negative samples. Our experimental evaluation demonstrates that RelGNN optimized by ASA for relationship prediction improves state-of-the-art performance across established benchmarks as well as on a real industrial dataset.