New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Linguistic Dependencies and Statistical Dependence

96 0 0.0 ( 0 )

Download Cite

Added by Jacob Louis Hoover

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Jacob Louis Hoover - Alessandro Sordoni - Wenyu Du

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce the use of large pretrained language models to compute contextualized estimates of the pointwise mutual information between words (CPMI). For multiple models and languages, we extract dependency trees which maximize CPMI, and compare to gold standard linguistic dependencies. Overall, we find that CPMI dependencies achieve an unlabelled undirected attachment score of at most $approx 0.5$. While far above chance, and consistently above a non-contextualized PMI baseline, this score is generally comparable to a simple baseline formed by connecting adjacent words. We analyze which kinds of linguistic dependencies are best captured in CPMI dependencies, and also find marked differences between the estimates of the large pretrained language models, illustrating how their different training schemes affect the type of dependencies they capture.

rate research

On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies

74 - Tianyi Zhang , Tatsunori Hashimoto 2021

We study how masking and predicting tokens in an unsupervised fashion can give rise to linguistic structures and downstream performance gains. Recent theories have suggested that pretrained language models acquire useful inductive biases through masks that implicitly act as cloze reductions for downstream tasks. While appealing, we show that the success of the random masking strategy used in practice cannot be explained by such cloze-like masks alone. We construct cloze-like masks using task-specific lexicons for three different classification datasets and show that the majority of pretrained performance gains come from generic masks that are not associated with the lexicon. To explain the empirical success of these generic masks, we demonstrate a correspondence between the Masked Language Model (MLM) objective and existing methods for learning statistical dependencies in graphical models. Using this, we derive a method for extracting these learned statistical dependencies in MLMs and show that these dependencies encode useful inductive biases in the form of syntactic structures. In an unsupervised parsing evaluation, simply forming a minimum spanning tree on the implied statistical dependence structure outperforms a classic method for unsupervised parsing (58.74 vs. 55.91 UUAS).

Computation and Language Artificial Intelligence Machine Learning

Polynomial methods in statistical inference: theory and practice

84 - Yihong Wu , Pengkun Yang 2021

This survey provides an exposition of a suite of techniques based on the theory of polynomials, collectively referred to as polynomial methods, which have recently been applied to address several challenging problems in statistical inference successfully. Topics including polynomial approximation, polynomial interpolation and majorization, moment space and positive polynomials, orthogonal polynomials and Gaussian quadrature are discussed, with their major probabilistic and statistical applications in property estimation on large domains and learning mixture models. These techniques provide useful tools not only for the design of highly practical algorithms with provable optimality, but also for establishing the fundamental limits of the inference problems through the method of moment matching. The effectiveness of the polynomial method is demonstrated in concrete problems such as entropy and support size estimation, distinct elements problem, and learning Gaussian mixture models.

Statistics Theory Information Theory Information Theory

Frobenius statistical manifolds & geometric invariants

127 - Noemie Combe , Philippe Combe , Hanna Nencka 2021

In this paper, we explicitly prove that statistical manifolds, related to exponential families and with flat structure connection have a Frobenius manifold structure. This latter object, at the interplay of beautiful interactions between topology and quantum field theory, raises natural questions, concerning the existence of Gromov--Witten invariants for those statistical manifolds. We prove that an analog of Gromov--Witten invariants for those statistical manifolds (GWS) exists. Similarly to its original version, these new invariants have a geometric interpretation concerning intersection points of para-holomorphic curves. However, it also plays an important role in the learning process, since it determines whether a system has succeeded in learning or failed.

Algebraic Geometry Information Theory Information Theory

Statistical Learning Guarantees for Compressive Clustering and Compressive Mixture Modeling

84 - Remi Gribonval 2020

We provide statistical learning guarantees for two unsupervised learning tasks in the context of compressive statistical learning, a general framework for resource-efficient large-scale learning that we introduced in a companion paper.The principle of compressive statistical learning is to compress a training collection, in one pass, into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. We explicitly describe and analyze random feature functions which empirical averages preserve the needed information for compressive clustering and compressive Gaussian mixture modeling with fixed known variance, and establish sufficient sketch sizes given the problem dimensions.

Machine Learning Information Theory Information Theory

Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation

65 - Jeremiah Birrell , Markos A. Katsoulakis , Yannis Pantazis 2020

Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural-network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.

Machine Learning Information Theory Information Theory

comments

Fetching comments

Private Arab University of Science and Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Linguistic Dependencies and Statistical Dependence

Ask ChatGPT about the research

No Arabic abstract

Read More