أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Chun-Liang Li

ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction

75 - Chen-Yu Lee , Chun-Liang Li , Chu Wang 2021

Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture rea ding orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a word-level graph connectivity. We study two fundamental document entity extraction tasks including word labeling and word grouping on the public FUNSD dataset and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% F1-score.

الحساب واللغة التعلم الآلي

Self-Trained One-class Classification for Unsupervised Anomaly Detection

129 - Jinsung Yoon , Kihyuk Sohn , Chun-Liang Li 2021

Anomaly detection (AD), separating anomalies from normal data, has various applications across domains, from manufacturing to healthcare. While most previous works have shown to be effective for cases with fully or partially labeled data, they are le ss practical for AD applications due to tedious data labeling processes. In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples. To tackle this problem, we build a robust one-class classification framework via data refinement. To refine the data accurately, we propose an ensemble of one-class classifiers, each of which is trained on a disjoint subset of training data. Moreover, we propose a self-training of deep representation one-class classifiers (STOC) that iteratively refines the data and deep representations. In experiments, we show the efficacy of our method for unsupervised anomaly detection on benchmarks from image and tabular data domains. For example, with a 10% anomaly ratio on CIFAR-10 data, the proposed method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.

التعلم الآلي

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

79 - Asma Ghandeharioun , Been Kim , Chun-Liang Li 2021

Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars. One of the principal benefits of counterfactual expl anations is allowing users to explore what-if scenarios through what does not and cannot exist in the data, a quality that many other forms of explanation such as heatmaps and influence functions are inherently incapable of doing. However, most previous work on generative explainability cannot disentangle important concepts effectively, produces unrealistic examples, or fails to retain relevant information. We propose a novel approach, DISSECT, that jointly trains a generator, a discriminator, and a concept disentangler to overcome such challenges using little supervision. DISSECT generates Concept Traversals (CTs), defined as a sequence of generated examples with increasing degrees of concepts that influence a classifiers decision. By training a generative model from a classifiers signal, DISSECT offers a way to discover a classifiers inherent notion of distinct concepts automatically rather than rely on user-predefined concepts. We show that DISSECT produces CTs that (1) disentangle several concepts, (2) are influential to a classifiers decision and are coupled to its reasoning due to joint training (3), are realistic, (4) preserve relevant information, and (5) are stable across similar inputs. We validate DISSECT on several challenging synthetic and realistic datasets where previous methods fall short of satisfying desirable criteria for interpretability and show that it performs consistently well and better than existing methods. Finally, we present experiments showing applications of DISSECT for detecting potential biases of a classifier and identifying spurious artifacts that impact predictions.

التعلم الآلي الذكاء الاصطناعي

CutPaste: Self-Supervised Learning for Anomaly Detection and Localization

141 - Chun-Liang Li , Kihyuk Sohn , Jinsung Yoon 2021

We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data. To this end, we propose a two-stage framework for building anomaly detectors using normal training data o nly. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at a random location of a large image. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects. We bring the improvement upon previous arts by 3.1 AUCs when learning representations from scratch. By transfer learning on pretrained representations on ImageNet, we achieve a new state-of-theart 96.6 AUC. Lastly, we extend the framework to learn and extract representations from patches to allow localizing defective areas without annotations during training.

الرؤية الحاسوبية وتمييز الأنماط

Learning and Evaluating Representations for Deep One-class Classification

434 - Kihyuk Sohn , Chun-Liang Li , Jinsung Yoon 2020

We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. The framework not only allows to learn better rep resentations, but also permits building one-class classifiers that are faithful to the target task. We argue that classifiers inspired by the statistical perspective in generative or discriminative models are more effective than existing approaches, such as a normality score from a surrogate classifier. We thoroughly evaluate different self-supervised representation learning algorithms under the proposed framework for one-class classification. Moreover, we present a novel distribution-augmented contrastive learning that extends training distributions via data augmentation to obstruct the uniformity of contrastive representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks, including novelty and anomaly detection. Finally, we present visual explanations, confirming that the decision-making process of deep one-class classifiers is intuitive to humans. The code is available at https://github.com/google-research/deep_representation_one_class.

الرؤية الحاسوبية وتمييز الأنماط

A Simple Semi-Supervised Learning Framework for Object Detection

119 - Kihyuk Sohn , Zizhao Zhang , Chun-Liang Li 2020

Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data. Although there has been remarkable recent progress, the scope of demonstration in SSL has mainly been on image class ification tasks. In this paper, we propose STAC, a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentations. We propose experimental protocols to evaluate the performance of semi-supervised object detection using MS-COCO and show the efficacy of STAC on both MS-COCO and VOC07. On VOC07, STAC improves the AP$^{0.5}$ from $76.30$ to $79.08$; on MS-COCO, STAC demonstrates $2{times}$ higher data efficiency by achieving 24.38 mAP using only 5% labeled data than supervised baseline that marks 23.86% using 10% labeled data. The code is available at https://github.com/google-research/ssl_detection/.

الرؤية الحاسوبية وتمييز الأنماط

Unsupervised Program Synthesis for Images By Sampling Without Replacement

156 - Chenghui Zhou , Chun-Liang Li , Barnabas Poczos 2020

Program synthesis has emerged as a successful approach to the image parsing task. Most prior works rely on a two-step scheme involving supervised pretraining of a Seq2Seq model with synthetic programs followed by reinforcement learning (RL) for fine- tuning with real reference images. Fully unsupervised approaches promise to train the model directly on the target images without requiring curated pretraining datasets. However, they struggle with the inherent sparsity of meaningful programs in the search space. In this paper, we present the first unsupervised algorithm capable of parsing constructive solid geometry (CSG) images into context-free grammar (CFG) without pretraining via non-differentiable renderer. To tackle the emph{non-Markovian} sparse reward problem, we combine three key ingredients -- (i) a grammar-encoded tree LSTM ensuring program validity (ii) entropy regularization and (iii) sampling without replacement from the CFG syntax tree. Empirically, our algorithm recovers meaningful programs in large search spaces (up to $3.8 times 10^{28}$). Further, even though our approach is fully unsupervised, it generalizes better than supervised methods on the synthetic 2D CSG dataset. On the 2D computer aided design (CAD) dataset, our approach significantly outperforms the supervised pretrained model and is competitive to the refined model.

التعلم الآلي التعلم الالي

Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer

122 - Hsueh-Ti Derek Liu , Michael Tao , Chun-Liang Li 2018

Many machine learning image classifiers are vulnerable to adversarial attacks, inputs with perturbations designed to intentionally trigger misclassification. Current adversarial methods directly alter pixel colors and evaluate against pixel norm-ball s: pixel perturbations smaller than a specified magnitude, according to a measurement norm. This evaluation, however, has limited practical utility since perturbations in the pixel space do not correspond to underlying real-world phenomena of image formation that lead to them and has no security motivation attached. Pixels in natural images are measurements of light that has interacted with the geometry of a physical scene. As such, we propose the direct perturbation of physical parameters that underly image formation: lighting and geometry. As such, we propose a novel evaluation measure, parametric norm-balls, by directly perturbing physical parameters that underly image formation. One enabling contribution we present is a physically-based differentiable renderer that allows us to propagate pixel gradients to the parametric space of lighting and geometry. Our approach enables physically-based adversarial attacks, and our differentiable renderer leverages models from the interactive rendering literature to balance the performance and accuracy trade-offs necessary for a memory-efficient and scalable adversarial data augmentation workflow.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد