ترغب بنشر مسار تعليمي؟ اضغط هنا

Incremental learning of semantic segmentation has emerged as a promising strategy for visual scene interpretation in the open- world setting. However, it remains challenging to acquire novel classes in an online fashion for the segmentation task, mai nly due to its continuously-evolving semantic label space, partial pixelwise ground-truth annotations, and constrained data availability. To ad- dress this, we propose an incremental learning strategy that can fast adapt deep segmentation models without catastrophic forgetting, using a streaming input data with pixel annotations on the novel classes only. To this end, we develop a uni ed learning strategy based on the Expectation-Maximization (EM) framework, which integrates an iterative relabeling strategy that lls in the missing labels and a rehearsal-based incremental learning step that balances the stability-plasticity of the model. Moreover, our EM algorithm adopts an adaptive sampling method to select informative train- ing data and a class-balancing training strategy in the incremental model updates, both improving the e cacy of model learning. We validate our approach on the PASCAL VOC 2012 and ADE20K datasets, and the results demonstrate its superior performance over the existing incremental methods.
Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications. However, it is particularly challenging to learn a class-invariant spatial-temp oral representation in such a setting. To address this, we propose a novel matching-based few-shot learning strategy for video sequences in this work. Our main idea is to introduce an implicit temporal alignment for a video pair, capable of estimating the similarity between them in an accurate and robust manner. Moreover, we design an effective context encoding module to incorporate spatial and feature channel context, resulting in better modeling of intra-class variations. To train our model, we develop a multi-task loss for learning video matching, leading to video features with better generalization. Extensive experimental results on two challenging benchmarks, show that our method outperforms the prior arts with a sizable margin on SomethingSomething-V2 and competitive results on Kinetics.
Let $mathcal{F}$ be a family of graphs. A graph $G$ is called textit{$mathcal{F}$-free} if for any $Fin mathcal{F}$, there is no subgraph of $G$ isomorphic to $F$. Given a graph $T$ and a family of graphs $mathcal{F}$, the generalized Tur{a}n number of $mathcal{F}$ is the maximum number of copies of $T$ in an $mathcal{F}$-free graph on $n$ vertices, denoted by $ex(n,T,mathcal{F})$. A linear forest is a graph whose connected components are all paths or isolated vertices. Let $mathcal{L}_{n,k}$ be the family of all linear forests of order $n$ with $k$ edges and $K^*_{s,t}$ a graph obtained from $K_{s,t}$ by substituting the part of size $s$ with a clique of the same size. In this paper, we determine the exact values of $ex(n,K_s,mathcal{L}_{n,k})$ and $ex(n,K^*_{s,t},mathcal{L}_{n,k})$. Also, we study the case of this problem when the textit{host graph} is bipartite. Denote by $ex_{bip}(n,T,mathcal{F})$ the maximum possible number of copies of $T$ in an $mathcal{F}$-free bipartite graph with each part of size $n$. We determine the exact value of $ex_{bip}(n,K_{s,t},mathcal{L}_{n,k})$. Our proof is mainly based on the shifting method.
Despite recent success of deep network-based Reinforcement Learning (RL), it remains elusive to achieve human-level efficiency in learning novel tasks. While previous efforts attempt to address this challenge using meta-learning strategies, they typi cally suffer from sampling inefficiency with on-policy RL algorithms or meta-overfitting with off-policy learning. In this work, we propose a novel meta-RL strategy to address those limitations. In particular, we decompose the meta-RL problem into three sub-tasks, task-exploration, task-inference and task-fulfillment, instantiated with two deep network agents and a task encoder. During meta-training, our method learns a task-conditioned actor network for task-fulfillment, an explorer network with a self-supervised reward shaping that encourages task-informative experiences in task-exploration, and a context-aware graph-based task encoder for task inference. We validate our approach with extensive experiments on several public benchmarks and the results show that our algorithm effectively performs exploration for task inference, improves sample efficiency during both training and testing, and mitigates the meta-overfitting problem.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا