New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Building Information Modeling and Classification by Visual Learning At A City Scale

76 0 0.0 ( 0 )

Download Cite

Added by Chaofeng Wang

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Qian Yu - Chaofeng Wang - Barbaros Cetiner

Computer Vision and Pattern Recognition Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we provide two case studies to demonstrate how artificial intelligence can empower civil engineering. In the first case, a machine learning-assisted framework, BRAILS, is proposed for city-scale building information modeling. Building information modeling (BIM) is an efficient way of describing buildings, which is essential to architecture, engineering, and construction. Our proposed framework employs deep learning technique to extract visual information of buildings from satellite/street view images. Further, a novel machine learning (ML)-based statistical tool, SURF, is proposed to discover the spatial patterns in building metadata. The second case focuses on the task of soft-story building classification. Soft-story buildings are a type of buildings prone to collapse during a moderate or severe earthquake. Hence, identifying and retrofitting such buildings is vital in the current earthquake preparedness efforts. For this task, we propose an automated deep learning-based procedure for identifying soft-story buildings from street view images at a regional scale. We also create a large-scale building image database and a semi-automated image labeling approach that effectively annotates new database entries. Through extensive computational experiments, we demonstrate the effectiveness of the proposed method.

rate research

Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention

342 - Richard Droste , Yifan Cai , Harshita Sharma 2019

Image representations are commonly learned from class labels, which are a simplistic approximation of human image understanding. In this paper we demonstrate that transferable representations of images can be learned without manual annotations by modeling human visual attention. The basis of our analyses is a unique gaze tracking dataset of sonographers performing routine clinical fetal anomaly screenings. Models of sonographer visual attention are learned by training a convolutional neural network (CNN) to predict gaze on ultrasound video frames through visual saliency prediction or gaze-point regression. We evaluate the transferability of the learned representations to the task of ultrasound standard plane detection in two contexts. Firstly, we perform transfer learning by fine-tuning the CNN with a limited number of labeled standard plane images. We find that fine-tuning the saliency predictor is superior to training from random initialization, with an average F1-score improvement of 9.6% overall and 15.3% for the cardiac planes. Secondly, we train a simple softmax regression on the feature activations of each CNN layer in order to evaluate the representations independently of transfer learning hyper-parameters. We find that the attention models derive strong representations, approaching the precision of a fully-supervised baseline model for all but the last layer.

Computer Vision and Pattern Recognition Machine Learning Neural and Evolutionary Computing

Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries

150 - Yuke Zhu , Ce Zhang , Christopher Re 2015

The complexity of the visual world creates significant challenges for comprehensive visual understanding. In spite of recent successes in visual recognition, todays vision systems would still struggle to deal with visual queries that require a deeper reasoning. We propose a knowledge base (KB) framework to handle an assortment of visual queries, without the need to train new classifiers for new tasks. Building such a large-scale multimodal KB presents a major challenge of scalability. We cast a large-scale MRF into a KB representation, incorporating visual, textual and structured data, as well as their diverse relations. We introduce a scalable knowledge base construction system that is capable of building a KB with half billion variables and millions of parameters in a few hours. Our system achieves competitive results compared to purpose-built models on standard recognition and retrieval tasks, while exhibiting greater flexibility in answering richer visual queries.

Computer Vision and Pattern Recognition Machine Learning

Heterogeneous Contrastive Learning: Encoding Spatial Information for Compact Visual Representations

187 - Xinyue Huo , Lingxi Xie , Longhui Wei 2020

Contrastive learning has achieved great success in self-supervised visual representation learning, but existing approaches mostly ignored spatial information which is often crucial for visual representation. This paper presents heterogeneous contrastive learning (HCL), an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations. We demonstrate the effectiveness of HCL by showing that (i) it achieves higher accuracy in instance discrimination and (ii) it surpasses existing pre-training methods in a series of downstream tasks while shrinking the pre-training costs by half. More importantly, we show that our approach achieves higher efficiency in visual representations, and thus delivers a key message to inspire the future research of self-supervised visual representation learning.

Computer Vision and Pattern Recognition Machine Learning

CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language Descriptions

70 - Qi Feng , Vitaly Ablavsky , Stan Sclaroff 2021

Natural Language (NL) descriptions can be one of the most convenient or the only way to interact with systems built to understand and detect city scale traffic patterns and vehicle-related events. In this paper, we extend the widely adopted CityFlow Benchmark with NL descriptions for vehicle targets and introduce the CityFlow-NL Benchmark. The CityFlow-NL contains more than 5,000 unique and precise NL descriptions of vehicle targets, making it the first multi-target multi-camera tracking with NL descriptions dataset to our knowledge. Moreover, the dataset facilitates research at the intersection of multi-object tracking, retrieval by NL descriptions, and temporal localization of events. In this paper, we focus on two foundational tasks: the Vehicle Retrieval by NL task and the Vehicle Tracking by NL task, which take advantage of the proposed CityFlow-NL benchmark and provide a strong basis for future research on the multi-target multi-camera tracking by NL description task.

Computer Vision and Pattern Recognition

Learning Visual Representations for Transfer Learning by Suppressing Texture

209 - Shlok Mishra , Anshul Shah , Ankan Bansal 2020

Recent literature has shown that features obtained from supervised training of CNNs may over-emphasize texture rather than encoding high-level information. In self-supervised learning in particular, texture as a low-level cue may provide shortcuts that prevent the network from learning higher level representations. To address these problems we propose to use classic methods based on anisotropic diffusion to augment training using images with suppressed texture. This simple method helps retain important edge information and suppress texture at the same time. We empirically show that our method achieves state-of-the-art results on object detection and image classification with eight diverse datasets in either supervised or self-supervised learning tasks such as MoCoV2 and Jigsaw. Our method is particularly effective for transfer learning tasks and we observed improved performance on five standard transfer learning datasets. The large improvements (up to 11.49%) on the Sketch-ImageNet dataset, DTD dataset and additional visual analyses with saliency maps suggest that our approach helps in learning better representations that better transfer.

Computer Vision and Pattern Recognition Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Building Information Modeling and Classification by Visual Learning At A City Scale

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions