ﻻ يوجد ملخص باللغة العربية
We present a novel approach to object classification and detection which requires minimal supervision and which combines visual texture cues and shape information learned from freely available unlabeled web search results. The explosion of visual data on the web can potentially make visual examples of almost any object easily accessible via web search. Previous unsupervised methods have utilized either large scale sources of texture cues from the web, or shape information from data such as crowdsourced CAD models. We propose a two-stream deep learning framework that combines these cues, with one stream learning visual texture cues from image search data, and the other stream learning rich shape information from 3D CAD models. To perform classification or detection for a novel image, the predictions of the two streams are combined using a late fusion scheme. We present experiments and visualizations for both tasks on the standard benchmark PASCAL VOC 2007 to demonstrate that texture and shape provide complementary information in our model. Our method outperforms previous web image based models, 3D CAD model based approaches, and weakly supervised models.
Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D spac
Weakly-supervised object detection has recently attracted increasing attention since it only requires image-levelannotations. However, the performance obtained by existingmethods is still far from being satisfactory compared with fully-supervised obj
While deep reinforcement learning (RL) promises freedom from hand-labeled data, great successes, especially for Embodied AI, require significant work to create supervision via carefully shaped rewards. Indeed, without shaped rewards, i.e., with only
Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to
The classification and regression head are both indispensable components to build up a dense object detector, which are usually supervised by the same training samples and thus expected to have consistency with each other for detecting objects accura