ترغب بنشر مسار تعليمي؟ اضغط هنا

It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning. Existing methods usually adopt a large off-the-shelf model to transfer knowledge to the small one via knowledge distillation. Despite their effectiveness, distillation-based methods may not be suitable for some resource-restricted scenarios due to the huge computational expenses of deploying a large model. In this paper, we study the issue of training self-supervised small models without distillation signals. We first evaluate the representation spaces of the small models and make two non-negligible observations: (i) small models can complete the pretext task without overfitting despite its limited capacity; (ii) small models universally suffer the problem of over-clustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon. Finally, we combine the validated techniques and improve the baseline of five small architectures with considerable margins, which indicates that training small self-supervised contrastive models is feasible even without distillation signals.
The challenge of the Class Incremental Learning~(CIL) lies in difficulty for a learner to discern the old classes data from the new as no previous classes data is preserved. In this paper, we reveal three causes for catastrophic forgetting at the rep resentational level, namely, representation forgetting, representation overlapping, and classifier deviation. Based on the observation above, we propose a new CIL framework, Contrastive Class Concentration for CIL (C4IL) to alleviate the phenomenon of representation overlapping that works in both memory-based and memory-free methods. Our framework leverages the class concentration effect of contrastive representation learning, therefore yielding a representation distribution with better intra-class compatibility and inter-class separability. Quantitative experiments showcase the effectiveness of our framework: it outperforms the baseline methods by 5% in terms of the average and top-1 accuracy in 10-phase and 20-phase CIL. Qualitative results also demonstrate that our method generates a more compact representation distribution that alleviates the overlapping problem.
Video-and-Language Inference is a recently proposed task for joint video-and-language understanding. This new task requires a model to draw inference on whether a natural language statement entails or contradicts a given video clip. In this paper, we study how to address three critical challenges for this task: judging the global correctness of the statement involved multiple semantic meanings, joint reasoning over video and subtitles, and modeling long-range relationships and complex social interactions. First, we propose an adaptive hierarchical graph network that achieves in-depth understanding of the video over complex interactions. Specifically, it performs joint reasoning over video and subtitles in three hierarchies, where the graph structure is adaptively adjusted according to the semantic structures of the statement. Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies. The semantic coherence learning can further improve the alignment between vision and linguistics, and the coherence across a sequence of video segments. Experimental results show that our method significantly outperforms the baseline by a large margin.
In many real-world games, such as traders repeatedly bargaining with customers, it is very hard for a single AI trader to make good deals with various customers in a few turns, since customers may adopt different strategies even the strategies they c hoose are quite simple. In this paper, we model this problem as fast adaptive learning in the finitely repeated games. We believe that past game history plays a vital role in such a learning procedure, and therefore we propose a novel framework (named, F3) to fuse the past and current game history with an Opponent Action Estimator (OAE) module that uses past game history to estimate the opponents future behaviors. The experiments show that the agent trained by F3 can quickly defeat opponents who adopt unknown new strategies. The F3 trained agent obtains more rewards in a fixed number of turns than the agents that are trained by deep reinforcement learning. Further studies show that the OAE module in F3 contains meta-knowledge that can even be transferred across different games.
When patients need to take medicine, particularly taking more than one kind of drug simultaneously, they should be alarmed that there possibly exists drug-drug interaction. Interaction between drugs may have a negative impact on patients or even caus e death. Generally, drugs that conflict with a specific drug (or label drug) are usually described in its drug label or package insert. Since more and more new drug products come into the market, it is difficult to collect such information by manual. We take part in the Drug-Drug Interaction (DDI) Extraction from Drug Labels challenge of Text Analysis Conference (TAC) 2018, choosing task1 and task2 to automatically extract DDI related mentions and DDI relations respectively. Instead of regarding task1 as named entity recognition (NER) task and regarding task2 as relation extraction (RE) task then solving it in a pipeline, we propose a two step joint model to detect DDI and its related mentions jointly. A sequence tagging system (CNN-GRU encoder-decoder) finds precipitants first and search its fine-grained Trigger and determine the DDI for each precipitant in the second step. Moreover, a rule based model is built to determine the sub-type for pharmacokinetic interation. Our system achieved best result in both task1 and task2. F-measure reaches 0.46 in task1 and 0.40 in task2.
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor perfo rmances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C$^2$SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا