ﻻ يوجد ملخص باللغة العربية
Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue. Existing methods largely rely on either external knowledge or statistical bias information to alleviate this problem. In this paper, we tackle this issue from another two aspects: (1) scene-object interaction aiming at learning specific knowledge from a scene via an additive attention mechanism; and (2) long-tail knowledge transfer which tries to transfer the rich knowledge learned from the head into the tail. Extensive experiments on the benchmark dataset Visual Genome on three tasks demonstrate that our method outperforms current state-of-the-art competitors.
Todays scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e.g., collapsing diverse human walk on / sit on / lay on beach into human on beach. Given such SGG, the down-stream tasks such as VQA can ha
Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild. Perception errors often lead to nonsensical compositions in the output scene graph
Despite recent advancements in single-domain or single-object image generation, it is still challenging to generate complex scenes containing diverse, multiple objects and their interactions. Scene graphs, composed of nodes as objects and directed-ed
Learning from image-text data has demonstrated recent success for many recognition tasks, yet is currently limited to visual features or individual visual concepts such as objects. In this paper, we propose one of the first methods that learn from im
Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approac