No Arabic abstract
Image data has a great potential of helping post-earthquake visual inspections of civil engineering structures due to the ease of data acquisition and the advantages in capturing visual information. A variety of techniques have been applied to detect damages automatically from a close-up image of a structural component. However, the application of the automatic damage detection methods become increasingly difficult when the image includes multiple components from different structures. To reduce the inaccurate false positive alarms, critical structural components need to be recognized first, and the damage alarms need to be cleaned using the component recognition results. To achieve the goal, this study aims at recognizing and extracting bridge components from images of urban scenes. The bridge component recognition begins with pixel-wise classifications of an image into 10 scene classes. Then, the original image and the scene classification results are combined to classify the image pixels into five component classes. The multi-scale convolutional neural networks (multi-scale CNNs) are used to perform pixel-wise classification, and the classification results are post-processed by averaging within superpixels and smoothing by conditional random fields (CRFs). The performance of the bridge component extraction is tested in terms of accuracy and consistency.
Image data has a great potential of helping conventional visual inspections of civil engineering structures due to the ease of data acquisition and the advantages in capturing visual information. A variety of techniques have been proposed to detect damages, such as cracks and spalling on a close-up image of a single component (columns and road surfaces etc.). However, these techniques commonly suffer from severe false-positives especially when the image includes multiple components of different structures. To reduce the false-positives and extract reliable information about the structures conditions, detection and localization of critical structural components are important first steps preceding the damage assessment. This study aims at recognizing bridge structural and non-structural components from images of urban scenes. During the bridge component recognition, every image pixel is classified into one of the five classes (non-bridge, columns, beams and slabs, other structural, other nonstructural) by multi-scale convolutional neural networks (multi-scale CNNs). To reduce false-positives and get consistent labels, the component classifications are integrated with scene understanding by an additional classifier with 10 higher-level scene classes (building, greenery, person, pavement, signs and poles, vehicles, bridges, water, sky, and others). The bridge component recognition integrated with the scene understanding is compared with the naive approach without scene classification in terms of accuracy, false-positives and consistencies to demonstrate the effectiveness of the integrated approach.
This paper investigates the automated recognition of structural bridge components using video data. Although understanding video data for structural inspections is straightforward for human inspectors, the implementation of the same task using machine learning methods has not been fully realized. In particular, single-frame image processing techniques, such as convolutional neural networks (CNNs), are not expected to identify structural components accurately when the image is a close-up view, lacking contextual information regarding where on the structure the image originates. Inspired by the significant progress in video processing techniques, this study investigates automated bridge component recognition using video data, where the information from the past frames is used to augment the understanding of the current frame. A new simulated video dataset is created to train the machine learning algorithms. Then, convolutional Neural Networks (CNNs) with recurrent architectures are designed and applied to implement the automated bridge component recognition task. Results are presented for simulated video data, as well as video collected in the field.
Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image. However, there are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs). In this paper, we aim to address this issue and develop a network that can outperform not only the canonical transformers, but also the high-performance convolutional models. We propose a new transformer based hybrid network by taking advantage of transformers to capture long-range dependencies, and of CNNs to model local features. Furthermore, we scale it to obtain a family of models, called CMTs, obtaining much better accuracy and efficiency than previous convolution and transformer based models. In particular, our CMT-S achieves 83.5% top-1 accuracy on ImageNet, while being 14x and 2x smaller on FLOPs than the existing DeiT and EfficientNet, respectively. The proposed CMT-S also generalizes well on CIFAR10 (99.2%), CIFAR100 (91.7%), Flowers (98.7%), and other challenging vision datasets such as COCO (44.3% mAP), with considerably less computational cost.
To improve the performance of most neuroimiage analysis pipelines, brain extraction is used as a fundamental first step in the image processing. But in the case of fetal brain development, there is a need for a reliable US-specific tool. In this work we propose a fully automated 3D CNN approach to fetal brain extraction from 3D US clinical volumes with minimal preprocessing. Our method accurately and reliably extracts the brain regardless of the large data variation inherent in this imaging modality. It also performs consistently throughout a gestational age range between 14 and 31 weeks, regardless of the pose variation of the subject, the scale, and even partial feature-obstruction in the image, outperforming all current alternatives.
There is a warning light for the loss of plant habitats worldwide that entails concerted efforts to conserve plant biodiversity. Thus, plant species classification is of crucial importance to address this environmental challenge. In recent years, there is a considerable increase in the number of studies related to plant taxonomy. While some researchers try to improve their recognition performance using novel approaches, others concentrate on computational optimization for their framework. In addition, a few studies are diving into feature extraction to gain significantly in terms of accuracy. In this paper, we propose an effective method for the leaf recognition problem. In our proposed approach, a leaf goes through some pre-processing to extract its refined color image, vein image, xy-projection histogram, handcrafted shape, texture features, and Fourier descriptors. These attributes are then transformed into a better representation by neural network-based encoders before a support vector machine (SVM) model is utilized to classify different leaves. Overall, our approach performs a state-of-the-art result on the Flavia leaf dataset, achieving the accuracy of 99.58% on test sets under random 10-fold cross-validation and bypassing the previous methods. We also release our codes (Scripts are available at https://github.com/dinhvietcuong1996/LeafRecognition) for contributing to the research community in the leaf classification problem.