TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

216 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Michael S. Ryoo

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Michael S. Ryoo - AJ Piergiovanni - Anurag Arnab

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks. Instead of relying on hand-designed splitting strategies to obtain visual tokens and processing a large number of densely sampled patches for attention, our approach learns to mine important tokens in visual data. This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in images. Our experiments demonstrate strong performance on several challenging benchmarks for both image and video recognition tasks. Importantly, due to our tokens being adaptive, we accomplish competitive results at significantly reduced compute amount.

قيم البحث

125 - Zeyu Zheng , Junhyuk Oh , Matteo Hessel 2019

The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the re ward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture how the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing what the agent should strive to do.

الذكاء الاصطناعي التعلم الآلي

Game Plan: What AI can do for Football, and What Football can do for AI

409 - Karl Tuyls , Shayegan Omidshafiei , Paul Muller 2020

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players and coordinated teams behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).

الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب أنظمة متعددة العملاء

What Doubling Tricks Can and Cant Do for Multi-Armed Bandits

64 - Lilian Besson 2018

An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any non-anytime algorithm is the Doubling Trick. In the context o f adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences of growing horizons (geometric and exponential) to generalize previously known results that certain doubling tricks can be used to conserve certain regret bounds. In a broad setting, we prove that a geometric doubling trick can be used to conserve (minimax) bounds in $R_T = O(sqrt{T})$ but cannot conserve (distribution-dependent) bounds in $R_T = O(log T)$. We give insights as to why exponential doubling tricks may be better, as they conserve bounds in $R_T = O(log T)$, and are close to conserving bounds in $R_T = O(sqrt{T})$.

التعلم الالي التعلم الآلي نظرية الإحصاء

Unprocessing Images for Learned Raw Denoising

74 - Tim Brooks , Ben Mildenhall , Tianfan Xue 2018

Machine learning techniques work best when the data used for training resembles the data used for evaluation. This holds true for learned single-image denoising algorithms, which are applied to real raw camera sensor readings but, due to practical co nstraints, are often trained on synthetic image data. Though it is understood that generalizing from synthetic to real data requires careful consideration of the noise properties of image sensors, the other aspects of a cameras image processing pipeline (gain, color correction, tone mapping, etc) are often overlooked, despite their significant effect on how raw measurements are transformed into finished images. To address this, we present a technique to unprocess images by inverting each step of an image processing pipeline, thereby allowing us to synthesize realistic raw sensor measurements from commonly available internet photos. We additionally model the relevant components of an image processing pipeline when evaluating our loss function, which allows training to be aware of all relevant photometric processing that will occur after denoising. By processing and unprocessing model outputs and training data in this way, we are able to train a simple convolutional neural network that has 14%-38% lower error rates and is 9x-18x faster than the previous state of the art on the Darmstadt Noise Dataset, and generalizes to sensors outside of that dataset as well.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

What Can (and Cant) we Do with Sparse Polynomials?

61 - Daniel S. Roche 2018

Simply put, a sparse polynomial is one whose zero coefficients are not explicitly stored. Such objects are ubiquitous in exact computing, and so naturally we would like to have efficient algorithms to handle them. However, with this compact storage c omes new algorithmic challenges, as fast algorithms for dense polynomials may no longer be efficient. In this tutorial we examine the state of the art for sparse polynomial algorithms in three areas: arithmetic, interpolation, and factorization. The aim is to highlight recent progress both in theory and in practice, as well as opportunities for future work.

الحساب الرمزي