ترغب بنشر مسار تعليمي؟ اضغط هنا

220 - Zan Gao , Chao Sun , Zhiyong Cheng 2021
Finding tampered regions in images is a hot research topic in machine learning and computer vision. Although many image manipulation location algorithms have been proposed, most of them only focus on the RGB images with different color spaces, and th e frequency information that contains the potential tampering clues is often ignored. In this work, a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) is proposed for generic image manipulation localization in which the RGB stream, the frequency stream, and the boundary artifact location are explored in a unified framework. Specifically, we first design an adaptive frequency selection module (AFS) to adaptively select the appropriate frequency to mine inconsistent statistics and eliminate the interference of redundant statistics. Then, an adaptive cross-attention fusion module (ACF) is proposed to adaptively fuse the RGB feature and the frequency feature. Finally, the boundary artifact location network (BAL) is designed to locate the boundary artifacts for which the parameters are jointly updated by the outputs of the ACF, and its results are further fed into the decoder. Thus, the parameters of the RGB stream, the frequency stream, and the boundary artifact location network are jointly optimized, and their latent complementary relationships are fully mined. The results of extensive experiments performed on four public benchmarks of the image manipulation localization task, namely, CASIA1.0, COVER, Carvalho, and In-The-Wild, demonstrate that the proposed TBNet can significantly outperform state-of-the-art generic image manipulation localization methods in terms of both MCC and F1.
Recommender systems, which merely leverage user-item interactions for user preference prediction (such as the collaborative filtering-based ones), often face dramatic performance degradation when the interactions of users or items are insufficient. I n recent years, various types of side information have been explored to alleviate this problem. Among them, knowledge graph (KG) has attracted extensive research interests as it can encode users/items and their associated attributes in the graph structure to preserve the relation information. In contrast, less attention has been paid to the item-item co-occurrence information (i.e., textit{co-view}), which contains rich item-item similarity information. It provides information from a perspective different from the user/item-attribute graph and is also valuable for the CF recommendation models. In this work, we make an effort to study the potential of integrating both types of side information (i.e., KG and item-item co-occurrence data) for recommendation. To achieve the goal, we propose a unified graph-based recommendation model (UGRec), which integrates the traditional directed relations in KG and the undirected item-item co-occurrence relations simultaneously. In particular, for a directed relation, we transform the head and tail entities into the corresponding relation space to model their relation; and for an undirected co-occurrence relation, we project head and tail entities into a unique hyperplane in the entity space to minimize their distance. In addition, a head-tail relation-aware attentive mechanism is designed for fine-grained relation modeling. Extensive experiments have been conducted on several publicly accessible datasets to evaluate the proposed model. Results show that our model outperforms several previous state-of-the-art methods and demonstrate the effectiveness of our UGRec model.
A number of studies point out that current Visual Question Answering (VQA) models are severely affected by the language prior problem, which refers to blindly making predictions based on the language shortcut. Some efforts have been devoted to overco ming this issue with delicate models. However, there is no research to address it from the angle of the answer feature space learning, despite of the fact that existing VQA methods all cast VQA as a classification task. Inspired by this, in this work, we attempt to tackle the language prior problem from the viewpoint of the feature space learning. To this end, an adapted margin cosine loss is designed to discriminate the frequent and the sparse answer feature space under each question type properly. As a result, the limited patterns within the language modality are largely reduced, thereby less language priors would be introduced by our method. We apply this loss function to several baseline models and evaluate its effectiveness on two VQA-CP benchmarks. Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.
Disentanglement is defined as the problem of learninga representation that can separate the distinct, informativefactors of variations of data. Learning such a representa-tion may be critical for developing explainable and human-controllable Deep Gen erative Models (DGMs) in artificialintelligence. However, disentanglement in GANs is not a triv-ial task, as the absence of sample likelihood and posteriorinference for latent variables seems to prohibit the forwardstep. Inspired by contrastive learning (CL), this paper, froma new perspective, proposes contrastive disentanglement ingenerative adversarial networks (CD-GAN). It aims at dis-entangling the factors of inter-class variation of visual datathrough contrasting image features, since the same factorvalues produce images in the same class. More importantly,we probe a novel way to make use of limited amount ofsupervision to the largest extent, to promote inter-class dis-entanglement performance. Extensive experimental resultson many well-known datasets demonstrate the efficacy ofCD-GAN for disentangling inter-class variation.
Item-based collaborative filtering (ICF) enjoys the advantages of high recommendation accuracy and ease in online penalization and thus is favored by the industrial recommender systems. ICF recommends items to a target user based on their similaritie s to the previously interacted items of the user. Great progresses have been achieved for ICF in recent years by applying advanced machine learning techniques (e.g., deep neural networks) to learn the item similarity from data. The early methods simply treat all the historical items equally and recent ones distinguish the different importance of items for a prediction. Despite the progress, we argue that those ICF models neglect the diverse intents of users on adopting items (e.g., watching a movie because of the director, leading actors, or the visual effects). As a result, they fail to estimate the item similarity on a finer-grained level to predict the users preference for an item, resulting in sub-optimal recommendation. In this work, we propose a general factor-level attention method for ICF models. The key of our method is to distinguish the importance of different factors when computing the item similarity for a prediction. To demonstrate the effectiveness of our method, we design a light attention neural network to integrate both item-level and factor-level attention for neural ICF models. It is model-agnostic and easy-to-implement. We apply it to two baseline ICF models and evaluate its effectiveness on six public datasets. Extensive experiments show the factor-level attention enhanced models consistently outperform their counterparts, demonstrating the potential of differentiate user intents on the factor-level for ICF recommendation models.
73 - Fan Liu , Zhiyong Cheng , Lei Zhu 2021
Graph Convolution Networks (GCNs) manifest great potential in recommendation. This is attributed to their capability on learning good user and item embeddings by exploiting the collaborative signals from the high-order neighbors. Like other GCN model s, the GCN based recommendation models also suffer from the notorious over-smoothing problem - when stacking more layers, node embeddings become more similar and eventually indistinguishable, resulted in performance degradation. The recently proposed LightGCN and LR-GCN alleviate this problem to some extent, however, we argue that they overlook an important factor for the over-smoothing problem in recommendation, that is, high-order neighboring users with no common interests of a user can be also involved in the users embedding learning in the graph convolution operation. As a result, the multi-layer graph convolution will make users with dissimilar interests have similar embeddings. In this paper, we propose a novel Interest-aware Message-Passing GCN (IMP-GCN) recommendation model, which performs high-order graph convolution inside subgraphs. The subgraph consists of users with similar interests and their interacted items. To form the subgraphs, we design an unsupervised subgraph generation module, which can effectively identify users with common interests by exploiting both user feature and graph structure. To this end, our model can avoid propagating negative information from high-order neighbors into embedding learning. Experimental results on three large-scale benchmark datasets show that our model can gain performance improvement by stacking more layers and outperform the state-of-the-art GCN-based recommendation models significantly.
Using the generalized extreme value theory to characterize tail distributions, we address liquidation, leverage, and optimal margins for bitcoin long and short futures positions. The empirical analysis of perpetual bitcoin futures on BitMEX shows tha t (1) daily forced liquidations to out- standing futures are substantial at 3.51%, and 1.89% for long and short; (2) investors got forced liquidation do trade aggressively with average leverage of 60X; and (3) exchanges should elevate current 1% margin requirement to 33% (3X leverage) for long and 20% (5X leverage) for short to reduce the daily margin call probability to 1%. Our results further suggest normality assumption on return significantly underestimates optimal margins. Policy implications are also discussed.
Recent studies have pointed out that many well-developed Visual Question Answering (VQA) models are heavily affected by the language prior problem, which refers to making predictions based on the co-occurrence pattern between textual questions and an swers instead of reasoning visual contents. To tackle it, most existing methods focus on enhancing visual feature learning to reduce this superficial textual shortcut influence on VQA model decisions. However, limited effort has been devoted to providing an explicit interpretation for its inherent cause. It thus lacks a good guidance for the research community to move forward in a purposeful way, resulting in model construction perplexity in overcoming this non-trivial problem. In this paper, we propose to interpret the language prior problem in VQA from a class-imbalance view. Concretely, we design a novel interpretation scheme whereby the loss of mis-predicted frequent and sparse answers of the same question type is distinctly exhibited during the late training phase. It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer, to a given question whose right answer is sparse in the training set. Based upon this observation, we further develop a novel loss re-scaling approach to assign different weights to each answer based on the training data statistics for computing the final loss. We apply our approach into three baselines and the experimental results on two VQA-CP benchmark datasets evidently demonstrate its effectiveness. In addition, we also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
Factorization Machines (FMs) are effective in incorporating side information to overcome the cold-start and data sparsity problems in recommender systems. Traditional FMs adopt the inner product to model the second-order interactions between differen t attributes, which are represented via feature vectors. The problem is that the inner product violates the triangle inequality property of feature vectors. As a result, it cannot well capture fine-grained attribute interactions, resulting in sub-optimal performance. Recently, the Euclidean distance is exploited in FMs to replace the inner product and has delivered better performance. However, previous FM methods including the ones equipped with the Euclidean distance all focus on the attribute-level interaction modeling, ignoring the critical intrinsic feature correlations inside attributes. Thereby, they fail to model the complex and rich interactions exhibited in the real-world data. To tackle this problem, in this paper, we propose a FM framework equipped with generalized metric learning techniques to better capture these feature correlations. In particular, based on this framework, we present a Mahalanobis distance and a deep neural network (DNN) methods, which can effectively model the linear and non-linear correlations between features, respectively. Besides, we design an efficient approach for simplifying the model functions. Experiments on several benchmark datasets demonstrate that our proposed framework outperforms several state-of-the-art baselines by a large margin. Moreover, we collect a new large-scale dataset on second-hand trading to justify the effectiveness of our method over cold-start and data sparsity problems in recommender systems.
139 - Lei Zhu , Hui Cui , Zhiyong Cheng 2020
Social network stores and disseminates a tremendous amount of user shared images. Deep hashing is an efficient indexing technique to support large-scale social image retrieval, due to its deep representation capability, fast retrieval speed and low s torage cost. Particularly, unsupervised deep hashing has well scalability as it does not require any manually labelled data for training. However, owing to the lacking of label guidance, existing methods suffer from severe semantic shortage when optimizing a large amount of deep neural network parameters. Differently, in this paper, we propose a Dual-level Semantic Transfer Deep Hashing (DSTDH) method to alleviate this problem with a unified deep hash learning framework. Our model targets at learning the semantically enhanced deep hash codes by specially exploiting the user-generated tags associated with the social images. Specifically, we design a complementary dual-level semantic transfer mechanism to efficiently discover the potential semantics of tags and seamlessly transfer them into binary hash codes. On the one hand, instance-level semantics are directly preserved into hash codes from the associated tags with adverse noise removing. Besides, an image-concept hypergraph is constructed for indirectly transferring the latent high-order semantic correlations of images and tags into hash codes. Moreover, the hash codes are obtained simultaneously with the deep representation learning by the discrete hash optimization strategy. Extensive experiments on two public social image retrieval datasets validate the superior performance of our method compared with state-of-the-art hashing methods. The source codes of our method can be obtained at https://github.com/research2020-1/DSTDH
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا