ﻻ يوجد ملخص باللغة العربية
Visual arts are of inestimable importance for the cultural, historic and economic growth of our society. One of the building blocks of most analysis in visual arts is to find similarity relationships among paintings of different artists and painting schools. To help art historians better understand visual arts, this paper presents a framework for visual link retrieval and knowledge discovery in digital painting datasets. Visual link retrieval is accomplished by using a deep convolutional neural network to perform feature extraction and a fully unsupervised nearest neighbor mechanism to retrieve links among digitized paintings. Historical knowledge discovery is achieved by performing a graph analysis that makes it possible to study influences among artists. An experimental evaluation on a database collecting paintings by very popular artists shows the effectiveness of the method. The unsupervised strategy makes the method interesting especially in cases where metadata are scarce, unavailable or difficult to collect.
Question answering is an important task for autonomous agents and virtual assistants alike and was shown to support the disabled in efficiently navigating an overwhelming environment. Many existing methods focus on observation-based questions, ignori
In this work, we address multi-modal information needs that contain text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. This task requires access to outside knowledge, which in our case we defin
Machine learning models are known to perpetuate and even amplify the biases present in the data. However, these data biases frequently do not become apparent until after the models are deployed. Our work tackles this issue and enables the preemptive
Visual navigation localizes a query place image against a reference database of place images, also known as a `visual map. Localization accuracy requirements for specific areas of the visual map, `scene classes, vary according to the context of the e
We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual,