ﻻ يوجد ملخص باللغة العربية
Reasoning about images/objects and their hierarchical interactions is a key concept for the next generation of computer vision approaches. Here we present a new framework to deal with it through a visual hierarchical context-based reasoning. Current reasoning methods use the fine-grained labels from images objects and their interactions to predict labels to new objects. Our framework modifies this current information flow. It goes beyond and is independent of the fine-grained labels from the objects to define the image context. It takes into account the hierarchical interactions between different abstraction levels (i.e. taxonomy) of information in the images and their bounding-boxes. Besides these connections, it considers their intrinsic characteristics. To do so, we build and apply graphs to graph convolution networks with convolutional neural networks. We show a strong effectiveness over widely used convolutional neural networks, reaching a gain 3 times greater on well-known image datasets. We evaluate the capability and the behavior of our framework under different scenarios, considering distinct (superclass, subclass and hierarchical) granularity levels. We also explore attention mechanisms through graph attention networks and pre-processing methods considering dimensionality expansion and/or reduction of the features representations. Further analyses are performed comparing supervised and semi-supervised approaches.
Neural Module Network (NMN) exhibits strong interpretability and compositionality thanks to its handcrafted neural modules with explicit multi-hop reasoning capability. However, most NMNs suffer from two critical drawbacks: 1) scalability: customized
Deep neural networks have achieved impressive success in large-scale visual object recognition tasks with a predefined set of classes. However, recognizing objects of novel classes unseen during training still remains challenging. The problem of dete
How do we determine whether two or more clothing items are compatible or visually appealing? Part of the answer lies in understanding of visual aesthetics, and is biased by personal preferences shaped by social attitudes, time, and place. In this wor
Human activity recognition is typically addressed by detecting key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges in activity
Humans learn to solve tasks of increasing complexity by building on top of previously acquired knowledge. Typically, there exists a natural progression in the tasks that we learn - most do not require completely independent solutions, but can be brok