Birds Eye: Probing for Linguistic Graph Structures with a Simple Information-Theoretic Approach

70 0 0.0 ( 0 )

Download Cite

Added by Yifan Hou

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yifan Hou - Mrinmaya Sachan

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

NLP has a rich history of representing our prior understanding of language in the form of graphs. Recent work on analyzing contextualized text representations has focused on hand-designed probe models to understand how and to what extent do these representations encode a particular linguistic phenomenon. However, due to the inter-dependence of various phenomena and randomness of training probe models, detecting how these representations encode the rich information in these linguistic graphs remains a challenging problem. In this paper, we propose a new information-theoretic probe, Birds Eye, which is a fairly simple probe method for detecting if and how these representations encode the information in these linguistic graphs. Instead of using classifier performance, our probe takes an information-theoretic view of probing and estimates the mutual information between the linguistic graph embedded in a continuous space and the contextualized word representations. Furthermore, we also propose an approach to use our probe to investigate localized linguistic information in the linguistic graphs using perturbation analysis. We call this probing setup Worms Eye. Using these probes, we analyze BERT models on their ability to encode a syntactic and a semantic graph structure, and find that these models encode to some degree both syntactic as well as semantic information; albeit syntactic information to a greater extent.

rate research

Information-Theoretic Probing for Linguistic Structure

246 - Tiago Pimentel , Josef Valvoda , Rowan Hall Maudslay 2020

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotations in that linguistic task from the networks learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that simpler models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic operationalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation. The experimental portion of our paper focuses on empirically estimating the mutual information between a linguistic property and BERT, comparing these estimates to several baselines. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research---plus English---totalling eleven languages.

Computation and Language Machine Learning

A Bayesian Framework for Information-Theoretic Probing

154 - Tiago Pimentel , Ryan Cotterell 2021

Pimentel et al. (2020) recently analysed probing from an information-theoretic perspective. They argue that probing should be seen as approximating a mutual information. This led to the rather unintuitive conclusion that representations encode exactly the same information about a target task as the original sentences. The mutual information, however, assumes the true probability distribution of a pair of random variables is known, leading to unintuitive results in settings where it is not. This paper proposes a new framework to measure what we term Bayesian mutual information, which analyses information from the perspective of Bayesian agents -- allowing for more intuitive findings in scenarios with finite data. For instance, under Bayesian MI we have that data can add information, processing can help, and information can hurt, which makes it more intuitive for machine learning applications. Finally, we apply our framework to probing where we believe Bayesian mutual information naturally operationalises ease of extraction by explicitly limiting the available background knowledge to solve a task.

Computation and Language Information Theory Information Theory

Linguistic Structures as Weak Supervision for Visual Scene Graph Generation

224 - Keren Ye , Adriana Kovashka 2021

Prior work in scene graph generation requires categorical supervision at the level of triplets - subjects and objects, and predicates that relate them, either with or without bounding box information. However, scene graph generation is a holistic task: thus holistic, contextual supervision should intuitively improve performance. In this work, we explore how linguistic structures in captions can benefit scene graph generation. Our method captures the information provided in captions about relations between individual triplets, and context for subjects and objects (e.g. visual properties are mentioned). Captions are a weaker type of supervision than triplets since the alignment between the exhaustive list of human-annotated subjects and objects in triplets, and the nouns in captions, is weak. However, given the large and diverse sources of multimodal data on the web (e.g. blog posts with images and captions), linguistic supervision is more scalable than crowdsourced triplets. We show extensive experimental comparisons against prior methods which leverage instance- and image-level supervision, and ablate our method to show the impact of leveraging phrasal and sequential context, and techniques to improve localization of subjects and objects.

Computer Vision and Pattern Recognition

3D-BEVIS: Birds-Eye-View Instance Segmentation

111 - Cathrin Elich , Francis Engelmann , Theodora Kontogianni 2019

Recent deep learning models achieve impressive results on 3D scene analysis tasks by operating directly on unstructured point clouds. A lot of progress was made in the field of object classification and semantic segmentation. However, the task of instance segmentation is less explored. In this work, we present 3D-BEVIS, a deep learning framework for 3D semantic instance segmentation on point clouds. Following the idea of previous proposal-free instance segmentation approaches, our model learns a feature embedding and groups the obtained feature space into semantic instances. Current point-based methods scale linearly with the number of points by processing local sub-parts of a scene individually. However, to perform instance segmentation by clustering, globally consistent features are required. Therefore, we propose to combine local point geometry with global context information from an intermediate birds-eye view representation.

Computer Vision and Pattern Recognition

Stratified Sampling for the Ising Model: A Graph-Theoretic Approach

347 - Amanda Streib , Noah Streib , Isabel Beichl 2013

We present a new approach to a classical problem in statistical physics: estimating the partition function and other thermodynamic quantities of the ferromagnetic Ising model. Markov chain Monte Carlo methods for this problem have been well-studied, although an algorithm that is truly practical remains elusive. Our approach takes advantage of the fact that, for a fixed bond strength, studying the ferromagnetic Ising model is a question of counting particular subgraphs of a given graph. We combine graph theory and heuristic sampling to determine coefficients that are independent of temperature and that, once obtained, can be used to determine the partition function and to compute physical quantities such as mean energy, mean magnetic moment, specific heat, and magnetic susceptibility.

Statistical Mechanics