Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

End-to-End Refinement Guided by Pre-trained Prototypical Classifier

124 0 0.0 ( 0 )

Download Cite

Added by Junwen Bai

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Junwen Bai - Zihang Lai - Runzhe Yang

Computer Vision and Pattern Recognition Artificial Intelligence

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Many real-world tasks involve identifying patterns from data satisfying background or prior knowledge. In domains like materials discovery, due to the flaws and biases in raw experimental data, the identification of X-ray diffraction patterns (XRD) often requires a huge amount of manual work in finding refined phases that are similar to the ideal theoretical ones. Automatically refining the raw XRDs utilizing the simulated theoretical data is thus desirable. We propose imitation refinement, a novel approach to refine imperfect input patterns, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined patterns imitate the ideal data. The classifier is trained on the ideal simulated data to classify patterns and learns an embedding space where each class is represented by a prototype. The refiner learns to refine the imperfect patterns with small modifications, such that their embeddings are closer to the corresponding prototypes. We show that the refiner can be trained in both supervised and unsupervised fashions. We further illustrate the effectiveness of the proposed approach both qualitatively and quantitatively in a digit refinement task and an X-ray diffraction pattern refinement task in materials discovery.

rate research

E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning

171 - Haiyang Xu , Ming Yan , Chenliang Li 2021

Vision-language pre-training (VLP) on large-scale image-text pairs has achieved huge success for the cross-modal downstream tasks. The most existing pre-training methods mainly adopt a two-step training procedure, which firstly employs a pre-trained object detector to extract region-based visual features, then concatenates the image representation and text embedding as the input of Transformer to train. However, these methods face problems of using task-specific visual representation of the specific object detector for generic cross-modal understanding, and the computation inefficiency of two-stage pipeline. In this paper, we propose the first end-to-end vision-language pre-trained model for both V+L understanding and generation, namely E2E-VLP, where we build a unified Transformer framework to jointly learn visual representation, and semantic alignments between image and text. We incorporate the tasks of object detection and image captioning into pre-training with a unified Transformer encoder-decoder architecture for enhancing visual learning. An extensive set of experiments have been conducted on well-established vision-language downstream tasks to demonstrate the effectiveness of this novel VLP paradigm.

Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language

Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness

181 - Manyu Zhu , Dongliang He , Xin Li 2021

Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial. Though U-shaped encoder-decoder frameworks have been witnessed to be successful, most of them share a common drawback of mask unawareness in feature extraction because all convolution windows (or regions), including those with various shapes of missing pixels, are treated equally and filtered with fixed learned kernels. To this end, we propose our novel mask-aware inpainting solution. Firstly, a Mask-Aware Dynamic Filtering (MADF) module is designed to effectively learn multi-scale features for missing regions in the encoding phase. Specifically, filters for each convolution window are generated from features of the corresponding region of the mask. The second fold of mask awareness is achieved by adopting Point-wise Normalization (PN) in our decoding phase, considering that statistical natures of features at masked points differentiate from those of unmasked points. The proposed PN can tackle this issue by dynamically assigning point-wise scaling factor and bias. Lastly, our model is designed to be an end-to-end cascaded refinement one. Supervision information such as reconstruction loss, perceptual loss and total variation loss is incrementally leveraged to boost the inpainting results from coarse to fine. Effectiveness of the proposed framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets including Places2, CelebA and Paris StreetView.

Computer Vision and Pattern Recognition

Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models

123 - Zhiyun Lu , Liangliang Cao , Yu Zhang 2019

In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task. We show that end-to-end ASR features, which integrate both acoustic and text information from speech, achieve promising results. We use RNN with self-attention as the sentiment classifier, which also provides an easy visualization through attention weights to help interpret model predictions. We use well benchmarked IEMOCAP dataset and a new large-scale speech sentiment dataset SWBD-sentiment for evaluation. Our approach improves the-state-of-the-art accuracy on IEMOCAP from 66.6% to 71.7%, and achieves an accuracy of 70.10% on SWBD-sentiment with more than 49,500 utterances.

Computation and Language Machine Learning Audio and Speech Processing

My Health Sensor, my Classifier: Adapting a Trained Classifier to Unlabeled End-User Data

96 - Konstantinos Nikolaidis , Stein Kristiansen , Thomas Plagemann 2020

In this work, we present an approach for unsupervised domain adaptation (DA) with the constraint, that the labeled source data are not directly available, and instead only access to a classifier trained on the source data is provided. Our solution, iteratively labels only high confidence sub-regions of the target data distribution, based on the belief of the classifier. Then it iteratively learns new classifiers from the expanding high-confidence dataset. The goal is to apply the proposed approach on DA for the task of sleep apnea detection and achieve personalization based on the needs of the patient. In a series of experiments with both open and closed sleep monitoring datasets, the proposed approach is applied to data from different sensors, for DA between the different datasets. The proposed approach outperforms in all experiments the classifier trained in the source domain, with an improvement of the kappa coefficient that varies from 0.012 to 0.242. Additionally, our solution is applied to digit classification DA between three well established digit datasets, to investigate the generalizability of the approach, and to allow for comparison with related work. Even without direct access to the source data, it achieves good results, and outperforms several well established unsupervised DA methods.

Machine Learning Signal Processing

Saliency Guided End-to-End Learning for Weakly Supervised Object Detection

99 - Baisheng Lai , Xiaojin Gong 2017

Weakly supervised object detection (WSOD), which is the problem of learning detectors using only image-level labels, has been attracting more and more interest. However, this problem is quite challenging due to the lack of location supervision. To address this issue, this paper integrates saliency into a deep architecture, in which the location in- formation is explored both explicitly and implicitly. Specifically, we select highly confident object pro- posals under the guidance of class-specific saliency maps. The location information, together with semantic and saliency information, of the selected proposals are then used to explicitly supervise the network by imposing two additional losses. Meanwhile, a saliency prediction sub-network is built in the architecture. The prediction results are used to implicitly guide the localization procedure. The entire network is trained end-to-end. Experiments on PASCAL VOC demonstrate that our approach outperforms all state-of-the-arts.

Computer Vision and Pattern Recognition

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

End-to-End Refinement Guided by Pre-trained Prototypical Classifier

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions