ﻻ يوجد ملخص باللغة العربية
We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories. In our setting, the training data for the source categories have bounding box annotations, while those for the target categories only have image-level annotations. Current state-of-the-art approaches focus on image-level visual or semantic similarity to adapt a detector trained on the source categories to the new target categories. In contrast, our key idea is to (i) use similarity not at the image-level, but rather at the region-level, and (ii) leverage richer common-sense (based on attribute, spatial, etc.) to guide the algorithm towards learning the correct detections. We acquire such common-sense cues automatically from readily-available knowledge bases without any extra human effort. On the challenging MS COCO dataset, we find that common-sense knowledge can substantially improve detection performance over existing transfer-learning baselines.
The task of identifying and reasoning with circumstantial preconditions associated with everyday facts is natural to humans. It is unclear whether state-of-the-art language models (LMs) understand the implicit preconditions that enable or invalidate
In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at a hig
Cant is important for understanding advertising, comedies and dog-whistle politics. However, computational research on cant is hindered by a lack of available datasets. In this paper, we propose a large and diverse Chinese dataset for creating and un
Feature selection, an effective technique for dimensionality reduction, plays an important role in many machine learning systems. Supervised knowledge can significantly improve the performance. However, faced with the rapid growth of newly emerging c
We integrate two powerful ideas, geometry and deep visual representation learning, into recurrent network architectures for mobile visual scene understanding. The proposed networks learn to lift and integrate 2D visual features over time into latent