Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Holistic Multi-View Building Analysis in the Wild with Projection Pooling

230 0 0.0 ( 0 )

Download Cite

Added by Krzysztof Maziarz

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Zbigniew Wojna - Krzysztof Maziarz - {L}ukasz Jocz

Computer Vision and Pattern Recognition Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We address six different classification tasks related to fine-grained building attributes: construction type, number of floors, pitch and geometry of the roof, facade material, and occupancy class. Tackling such a remote building analysis problem became possible only recently due to growing large-scale datasets of urban scenes. To this end, we introduce a new benchmarking dataset, consisting of 49426 images (top-view and street-view) of 9674 buildings. These photos are further assembled, together with the geometric metadata. The dataset showcases various real-world challenges, such as occlusions, blur, partially visible objects, and a broad spectrum of buildings. We propose a new projection pooling layer, creating a unified, top-view representation of the top-view and the side views in a high-dimensional space. It allows us to utilize the building and imagery metadata seamlessly. Introducing this layer improves classification accuracy -- compared to highly tuned baseline models -- indicating its suitability for building analysis.

rate research

Single View Metrology in the Wild

101 - Rui Zhu , Xingyi Yang , Yannick Hold-Geoffroy 2020

Most 3D reconstruction methods may only recover scene properties up to a global scale ambiguity. We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground as well as camera parameters of orientation and field of view, using just a monocular image acquired in unconstrained condition. Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights, through estimation of bounding box projections. We leverage categorical priors for objects such as humans or cars that commonly occur in natural images, as references for scale estimation. We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion. Furthermore, the perceptual quality of our outputs is validated by a user study.

Computer Vision and Pattern Recognition

Multi-resolution Outlier Pooling for Sorghum Classification

286 - Chao Ren , Justin Dulay , Gregory Rolwes 2021

Automated high throughput plant phenotyping involves leveraging sensors, such as RGB, thermal and hyperspectral cameras (among others), to make large scale and rapid measurements of the physical properties of plants for the purpose of better understanding the difference between crops and facilitating rapid plant breeding programs. One of the most basic phenotyping tasks is to determine the cultivar, or species, in a particular sensor product. This simple phenotype can be used to detect errors in planting and to learn the most differentiating features between cultivars. It is also a challenging visual recognition task, as a large number of highly related crops are grown simultaneously, leading to a classification problem with low inter-class variance. In this paper, we introduce the Sorghum-100 dataset, a large dataset of RGB imagery of sorghum captured by a state-of-the-art gantry system, a multi-resolution network architecture that learns both global and fine-grained features on the crops, and a new global pooling strategy called Dynamic Outlier Pooling which outperforms standard global pooling strategies on this task.

Computer Vision and Pattern Recognition Machine Learning

Single View Geocentric Pose in the Wild

62 - Gordon Christie , Kevin Foster , Shea Hagstrom 2021

Current methods for Earth observation tasks such as semantic mapping, map alignment, and change detection rely on near-nadir images; however, often the first available images in response to dynamic world events such as natural disasters are oblique. These tasks are much more difficult for oblique images due to observed object parallax. There has been recent success in learning to regress geocentric pose, defined as height above ground and orientation with respect to gravity, by training with airborne lidar registered to satellite images. We present a model for this novel task that exploits affine invariance properties to outperform state of the art performance by a wide margin. We also address practical issues required to deploy this method in the wild for real-world applications. Our data and code are publicly available.

Computer Vision and Pattern Recognition

A Multi-term and Multi-task Analyzing Framework for Affective Analysis in-the-wild

118 - Sachihiro Youoku , Yuushi Toyoda , Takahisa Yamamoto 2020

Human affective recognition is an important factor in human-computer interaction. However, the method development with in-the-wild data is not yet accurate enough for practical usage. In this paper, we introduce the affective recognition method focusing on valence-arousal (VA) and expression (EXP) that was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2020 Contest. Since we considered that affective behaviors have many observable features that have their own time frames, we introduced multiple optimized time windows (short-term, middle-term, and long-term) into our analyzing framework for extracting feature parameters from video data. Moreover, multiple modality data are used, including action units, head poses, gaze, posture, and ResNet 50 or Efficient NET features, and are optimized during the extraction of these features. Then, we generated affective recognition models for each time window and ensembled these models together. Also, we fussed the valence, arousal, and expression models together to enable the multi-task learning, considering the fact that the basic psychological states behind facial expressions are closely related to each another. In the validation set, our model achieved a valence-arousal score of 0.498 and a facial expression score of 0.471. These verification results reveal that our proposed framework can improve estimation accuracy and robustness effectively.

Computer Vision and Pattern Recognition

Multi-modal Affect Analysis using standardized data within subjects in the Wild

164 - Sachihiro Youoku , Takahisa Yamamoto , Junya Saito 2021

Human affective recognition is an important factor in human-computer interaction. However, the method development with in-the-wild data is not yet accurate enough for practical usage. In this paper, we introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal calculation that was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest. When annotating facial expressions from a video, we thought that it would be judged not only from the features common to all people, but also from the relative changes in the time series of individuals. Therefore, after learning the common features for each frame, we constructed a facial expression estimation model and valence-arousal model using time-series data after combining the common features and the standardized features for each video. Furthermore, the above features were learned using multi-modal data such as image features, AU, Head pose, and Gaze. In the validation set, our model achieved a facial expression score of 0.546. These verification results reveal that our proposed framework can improve estimation accuracy and robustness effectively.

Computer Vision and Pattern Recognition

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Holistic Multi-View Building Analysis in the Wild with Projection Pooling

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions