ترغب بنشر مسار تعليمي؟ اضغط هنا

113 - Zhenzhi Wang , Liyu Wu , Zhimin Li 2021
Multi-modal Ads Video Understanding Challenge is the first grand challenge aiming to comprehensively understand ads videos. Our challenge includes two tasks: video structuring in the temporal dimension and multi-modal video classification. It asks th e participants to accurately predict both the scene boundaries and the multi-label categories of each scene based on a fine-grained and ads-related category hierarchy. Therefore, our task has four distinguishing features from previous ones: ads domain, multi-modal information, temporal segmentation, and multi-label classification. It will advance the foundation of ads video understanding and have a significant impact on many ads applications like video recommendation. This paper presents an overview of our challenge, including the background of ads videos, an elaborate description of task and dataset, evaluation protocol, and our proposed baseline. By ablating the key components of our baseline, we would like to reveal the main challenges of this task and provide useful guidance for future research of this area. In this paper, we give an extended version of our challenge overview. The dataset will be publicly available at https://algo.qq.com/.
178 - Zhenzhi Wang , Limin Wang , Tao Wu 2021
Temporal grounding aims to temporally localize a video moment in the video whose semantics are related to a given natural language query. Existing methods typically apply a detection or regression pipeline on the fused representation with a focus on designing complicated heads and fusion strategies. Instead, from a perspective on temporal grounding as a metric-learning problem, we present a Dual Matching Network (DMN), to directly model the relations between language queries and video moments in a joint embedding space. This new metric-learning framework enables fully exploiting negative samples from two new aspects: constructing negative cross-modal pairs from a dual matching scheme and mining negative pairs across different videos. These new negative samples could enhance the joint representation learning of two modalities via cross-modal pair discrimination to maximize their mutual information. Experiments show that DMN achieves highly competitive performance compared with state-of-the-art methods on four video grounding benchmarks. Based on DMN, we present a winner solution for STVG challenge of the 3rd PIC workshop. This suggests that metric-learning is still a promising method for temporal grounding via capturing the essential cross-modal correlation in a joint embedding space.
Zero-shot translation, directly translating between language pairs unseen in training, is a promising capability of multilingual neural machine translation (NMT). However, it usually suffers from capturing spurious correlations between the output lan guage and language invariant semantics due to the maximum likelihood training objective, leading to poor transfer performance on zero-shot translation. In this paper, we introduce a denoising autoencoder objective based on pivot language into traditional training objective to improve the translation accuracy on zero-shot directions. The theoretical analysis from the perspective of latent variables shows that our approach actually implicitly maximizes the probability distributions for zero-shot directions. On two benchmark machine translation datasets, we demonstrate that the proposed method is able to effectively eliminate the spurious correlations and significantly outperforms state-of-the-art methods with a remarkable performance. Our code is available at https://github.com/Victorwz/zs-nmt-dae.
355 - Zhi Wang , Chaoge Liu , Xiang Cui 2021
While artificial intelligence (AI) is widely applied in various areas, it is also being used maliciously. It is necessary to study and predict AI-powered attacks to prevent them in advance. Turning neural network models into stegomalware is a malicio us use of AI, which utilizes the features of neural network models to hide malware while maintaining the performance of the models. However, the existing methods have a low malware embedding rate and a high impact on the model performance, making it not practical. Therefore, by analyzing the composition of the neural network models, this paper proposes new methods to embed malware in models with high capacity and no service quality degradation. We used 19 malware samples and 10 mainstream models to build 550 malware-embedded models and analyzed the models performance on ImageNet dataset. A new evaluation method that combines the embedding rate, the model performance impact and the embedding effort is proposed to evaluate the existing methods. This paper also designs a trigger and proposes an application scenario in attack tasks combining EvilModel with WannaCry. This paper further studies the relationship between neural network models embedding capacity and the model structure, layer and size. With the widespread application of artificial intelligence, utilizing neural networks for attacks is becoming a forwarding trend. We hope this work can provide a reference scenario for the defense of neural network-assisted attacks.
In recent years, world business in online discussions and opinion sharing on social media is booming. Re-entry prediction task is thus proposed to help people keep track of the discussions which they wish to continue. Nevertheless, existing works onl y focus on exploiting chatting history and context information, and ignore the potential useful learning signals underlying conversation data, such as conversation thread patterns and repeated engagement of target users, which help better understand the behavior of target users in conversations. In this paper, we propose three interesting and well-founded auxiliary tasks, namely, Spread Pattern, Repeated Target user, and Turn Authorship, as the self-supervised signals for re-entry prediction. These auxiliary tasks are trained together with the main task in a multi-task manner. Experimental results on two datasets newly collected from Twitter and Reddit show that our method outperforms the previous state-of-the-arts with fewer parameters and faster convergence. Extensive experiments and analysis show the effectiveness of our proposed models and also point out some key ideas in designing self-supervised tasks.
The ability to deal with uncertainty in machine learning models has become equally, if not more, crucial to their predictive ability itself. For instance, during the pandemic, governmental policies and personal decisions are constantly made around un certainties. Targeting this, Neural Process Families (NPFs) have recently shone a light on prediction with uncertainties by bridging Gaussian processes and neural networks. Latent neural process, a member of NPF, is believed to be capable of modelling the uncertainty on certain points (local uncertainty) as well as the general function priors (global uncertainties). Nonetheless, some critical questions remain unresolved, such as a formal definition of global uncertainties, the causality behind global uncertainties, and the manipulation of global uncertainties for generative models. Regarding this, we build a member GloBal Convolutional Neural Process(GBCoNP) that achieves the SOTA log-likelihood in latent NPFs. It designs a global uncertainty representation p(z), which is an aggregation on a discretized input space. The causal effect between the degree of global uncertainty and the intra-task diversity is discussed. The learnt prior is analyzed on a variety of scenarios, including 1D, 2D, and a newly proposed spatial-temporal COVID dataset. Our manipulation of the global uncertainty not only achieves generating the desired samples to tackle few-shot learning, but also enables the probability evaluation on the functional priors.
In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing. Ho wever, directly applying this method heavily suffers from the dialogue entity inconsistency caused by the removal of delexicalized tokens, as well as the catastrophic forgetting problem of the pre-trained model during fine-tuning, leading to unsatisfactory performance. To alleviate these problems, we design a novel GPT-Adapter-CopyNet network, which incorporates the lightweight adapter and CopyNet modules into GPT-2 to achieve better performance on transfer learning and dialogue entity generation. Experimental results conducted on the DSTC8 Track 1 benchmark and MultiWOZ dataset demonstrate that our proposed approach significantly outperforms baseline models with a remarkable performance on automatic and human evaluations.
Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory a nd computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.
Nucleon momentum distribution (NMD), particularly its high-momentum components, is essential for understanding the nucleon--nucleon ($ NN $) correlations in nuclei. Herein, we develop the studies of NMD of $^{56}text{Fe}$ from the axially deformed re lativistic mean-field (RMF) model. Moreover, we introduce the effects of $ NN $ correlation into the RMF model from phenomenological models based on deuteron and nuclear matter. For the region $ k<k_{text{F}} $, the effects of deformation on the NMD of the RMF model are investigated using the total and single-particle NMDs. For the region $ k>k_{text{F}} $, the high-momentum components of the RMF model are modified by the effects of $ NN $ correlation, which agree with the experimental data. Comparing the NMD of relativistic and non-relativistic mean-field models, the relativistic effects on nuclear structures in momentum space are analyzed. Finally, by analogizing the tensor correlations in deuteron and Jastrow-type correlations in nuclear matter, the behaviors and contributions of $ NN $ correlations in $^{56}text{Fe}$ are further analyzed, which helps clarify the effects of the tensor force on the NMD of heavy nuclei.
The expression for the radial moments $leftlangle r^{n}rightrangle_{c}$ of the nuclear charge density has been discussed under the plane wave Born approximation (PWBA) method recently, which is significant to investigate the nuclear surface thickness and neutron distribution radius. In this paper, we extend the studies of extracting second-order moment $leftlangle r^{2}rightrangle_{c}$ and fourth-order moment $leftlangle r^{4}rightrangle_{c}$ from the Coulomb form factors $|F_{C}(q)|^2$ by the distorted wave Born approximation (DWBA) at the small momentum transfer $ q $ region. Based on the relativistic mean-field (RMF) calculations, the DWBA form factors $F_{C}^{DW}(q)$ are expanded into $ q^4 $, where the corresponding charge distributions are corrected by the contributions of neutron and spin-orbit densities. In the small $ q $ region, it is found that the experimental $|F_{C}(q)|^2$ can be well reproduced by considering the contributions of the $leftlangle r^{4}rightrangle_{c}$ at the small $ q $ region. Through further analyzing the second-order and fourth-order expansion coefficients of the $F_{C}^{DW}(q)$, the relationship between the expansion coefficients and proton number $ Z $ is obtained. By the relationship, we extract the $leftlangle r^{2}rightrangle_{c}$ and $leftlangle r^{4}rightrangle_{c}$ from the limited experimental data of form factors at the small $ q $ region. Within the permissible range of error, the extracted $leftlangle r^{n}rightrangle_{c}$ are consistent with the experimental data in this paper.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا