No Arabic abstract
Matching two different sets of items, called heterogeneous set-to-set matching problem, has recently received attention as a promising problem. The difficulties are to extract features to match a correct pair of different sets and also preserve two types of exchangeability required for set-to-set matching: the pair of sets, as well as the items in each set, should be exchangeable. In this study, we propose a novel deep learning architecture to address the abovementioned difficulties and also an efficient training framework for set-to-set matching. We evaluate the methods through experiments based on two industrial applications: fashion set recommendation and group re-identification. In these experiments, we show that the proposed method provides significant improvements and results compared with the state-of-the-art methods, thereby validating our architecture for the heterogeneous set matching problem.
Person re-identification (Re-ID) aims at matching images of the same person across disjoint camera views, which is a challenging problem in multimedia analysis, multimedia editing and content-based media retrieval communities. The major challenge lies in how to preserve similarity of the same person across video footages with large appearance variations, while discriminating different individuals. To address this problem, conventional methods usually consider the pairwise similarity between persons by only measuring the point to point (P2P) distance. In this paper, we propose to use deep learning technique to model a novel set to set (S2S) distance, in which the underline objective focuses on preserving the compactness of intra-class samples for each camera view, while maximizing the margin between the intra-class set and inter-class set. The S2S distance metric is consisted of three terms, namely the class-identity term, the relative distance term and the regularization term. The class-identity term keeps the intra-class samples within each camera view gathering together, the relative distance term maximizes the distance between the intra-class class set and inter-class set across different camera views, and the regularization term smoothness the parameters of deep convolutional neural network (CNN). As a result, the final learned deep model can effectively find out the matched target to the probe object among various candidates in the video gallery by learning discriminative and stable feature representations. Using the CUHK01, CUHK03, PRID2011 and Market1501 benchmark datasets, we extensively conducted comparative evaluations to demonstrate the advantages of our method over the state-of-the-art approaches.
Deep neural networks are vulnerable to adversarial examples. Prior defenses attempted to make deep networks more robust by either changing the network architecture or augmenting the training set with adversarial examples, but both have inherent limitations. Motivated by recent research that shows outliers in the training set have a high negative influence on the trained model, we studied the relationship between model robustness and the quality of the training set. We first show that outliers give the model better generalization ability but weaker robustness. Next, we propose an adversarial example detection framework, in which we design two methods for removing outliers from training set to obtain the sanitized model and then detect adversarial example by calculating the difference of outputs between the original and the sanitized model. We evaluated the framework on both MNIST and SVHN. Based on the difference measured by Kullback-Leibler divergence, we could detect adversarial examples with accuracy between 94.67% to 99.89%.
In this paper, we propose a new first-order gradient-based algorithm to train deep neural networks. We first introduce the sign operation of stochastic gradients (as in sign-based methods, e.g., SIGN-SGD) into ADAM, which is called as signADAM. Moreover, in order to make the rate of fitting each feature closer, we define a confidence function to distinguish different components of gradients and apply it to our algorithm. It can generate more sparse gradients than existing algorithms do. We call this new algorithm signADAM++. In particular, both our algorithms are easy to implement and can speed up training of various deep neural networks. The motivation of signADAM++ is preferably learning features from the most different samples by updating large and useful gradients regardless of useless information in stochastic gradients. We also establish theoretical convergence guarantees for our algorithms. Empirical results on various datasets and models show that our algorithms yield much better performance than many state-of-the-art algorithms including SIGN-SGD, SIGNUM and ADAM. We also analyze the performance from multiple perspectives including the loss landscape and develop an adaptive method to further improve generalization. The source code is available at https://github.com/DongWanginxdu/signADAM-Learn-by-Confidence.
Deep Neural Networks (DNNs) deliver state-of-the-art performance in many image recognition and understanding applications. However, despite their outstanding performance, these models are black-boxes and it is hard to understand how they make their decisions. Over the past few years, researchers have studied the problem of providing explanations of why DNNs predicted their results. However, existing techniques are either obtrusive, requiring changes in model training, or suffer from low output quality. In this paper, we present a novel method, NeuroMask, for generating an interpretable explanation of classification model results. When applied to image classification models, NeuroMask identifies the image parts that are most important to classifier results by applying a mask that hides/reveals different parts of the image, before feeding it back into the model. The mask values are tuned by minimizing a properly designed cost function that preserves the classification result and encourages producing an interpretable mask. Experiments using state-of-the-art Convolutional Neural Networks for image recognition on different datasets (CIFAR-10 and ImageNet) show that NeuroMask successfully localizes the parts of the input image which are most relevant to the DNN decision. By showing a visual quality comparison between NeuroMask explanations and those of other methods, we find NeuroMask to be both accurate and interpretable.
In recent years, deep learning has made brilliant achievements in image classification. However, image classification of small datasets is still not obtained good research results. This article first briefly explains the application and characteristics of convolutional neural networks and visual transformers. Meanwhile, the influence of small data set on classification and the solution are introduced. Then a series of experiments are carried out on the small datasets by using various models, and the problems of some models in the experiments are discussed. Through the comparison of experimental results, the recommended deep learning model is given according to the model application environment. Finally, we give directions for future work.