ﻻ يوجد ملخص باللغة العربية
Zero-shot action recognition is the task of recognizingaction classes without visual examples, only with a seman-tic embedding which relates unseen to seen classes. Theproblem can be seen as learning a function which general-izes well to instances of unseen classes without losing dis-crimination between classes. Neural networks can modelthe complex boundaries between visual classes, which ex-plains their success as supervised models. However, inzero-shot learning, these highly specialized class bound-aries may not transfer well from seen to unseen classes.In this paper we propose a centroid-based representation,which clusters visual and semantic representation, consid-ers all training samples at once, and in this way generaliz-ing well to instances from unseen classes. We optimize theclustering using Reinforcement Learning which we show iscritical for our approach to work. We call the proposedmethod CLASTER and observe that it consistently outper-forms the state-of-the-art in all standard datasets, includ-ing UCF101, HMDB51 and Olympic Sports; both in thestandard zero-shot evaluation and the generalized zero-shotlearning. Further, we show that our model performs com-petitively in the image domain as well, outperforming thestate-of-the-art in many settings.
Although there has been significant research in egocentric action recognition, most methods and tasks, including EPIC-KITCHENS, suppose a fixed set of action classes. Fixed-set classification is useful for benchmarking methods, but is often unrealist
Zero-shot action recognition is the task of classifying action categories that are not available in the training set. In this setting, the standard evaluation protocol is to use existing action recognition datasets(e.g. UCF101) and randomly split the
There are many realistic applications of activity recognition where the set of potential activity descriptions is combinatorially large. This makes end-to-end supervised training of a recognition system impractical as no training set is practically a
Few-shot learning (FSL) for action recognition is a challenging task of recognizing novel action categories which are represented by few instances in the training data. In a more generalized FSL setting (G-FSL), both seen as well as novel action cate
Zero-shot learning (ZSL) aims to recognize unseen object classes without any training samples, which can be regarded as a form of transfer learning from seen classes to unseen ones. This is made possible by learning a projection between a feature spa