Quantity Tagger: A Latent-Variable Sequence Labeling Approach to Solving Addition-Subtraction Word Problems

69 0 0.0 ( 0 )

Download Cite

Added by Yanyan Zou

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Yanyan Zou - Wei Lu

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

An arithmetic word problem typically includes a textual description containing several constant quantities. The key to solving the problem is to reveal the underlying mathematical relations (such as addition and subtraction) among quantities, and then generate equations to find solutions. This work presents a novel approach, Quantity Tagger, that automatically discovers such hidden relations by tagging each quantity with a sign corresponding to one type of mathematical operation. For each quantity, we assume there exists a latent, variable-sized quantity span surrounding the quantity token in the text, which conveys information useful for determining its sign. Empirical results show that our method achieves 5 and 8 points of accuracy gains on two datasets respectively, compared to prior approaches.

rate research

RelWalk A Latent Variable Model Approach to Knowledge Graph Embedding

93 - Danushka Bollegala , Huda Hakami , Yuichi Yoshida 2021

Embedding entities and relations of a knowledge graph in a low-dimensional space has shown impressive performance in predicting missing links between entities. Although progresses have been achieved, existing methods are heuristically motivated and theoretical understanding of such embeddings is comparatively underdeveloped. This paper extends the random walk model (Arora et al., 2016a) of word embeddings to Knowledge Graph Embeddings (KGEs) to derive a scoring function that evaluates the strength of a relation R between two entities h (head) and t (tail). Moreover, we show that marginal loss minimisation, a popular objective used in much prior work in KGE, follows naturally from the log-likelihood ratio maximisation under the probabilities estimated from the KGEs according to our theoretical relationship. We propose a learning objective motivated by the theoretical analysis to learn KGEs from a given knowledge graph. Using the derived objective, accurate KGEs are learnt from FB15K237 and WN18RR benchmark datasets, providing empirical evidence in support of the theory.

Computation and Language

On the Importance of Word Order Information in Cross-lingual Sequence Labeling

105 - Zihan Liu , Genta Indra Winata , Samuel Cahyawijaya 2020

Word order variances generally exist in different languages. In this paper, we hypothesize that cross-lingual models that fit into the word order of the source language might fail to handle target languages. To verify this hypothesis, we investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages. To do so, we reduce the source language word order information fitted to sequence encoders and observe the performance changes. In addition, based on this hypothesis, we propose a new method for fine-tuning multilingual BERT in downstream cross-lingual sequence labeling tasks. Experimental results on dialogue natural language understanding, part-of-speech tagging, and named entity recognition tasks show that reducing word order information fitted to the model can achieve better zero-shot cross-lingual performance. Furthermore, our proposed methods can also be applied to strong cross-lingual baselines, and improve their performances.

Computation and Language

Neural Latent Dependency Model for Sequence Labeling

89 - Yang Zhou , Yong Jiang , Zechuan Hu 2020

Sequence labeling is a fundamental problem in machine learning, natural language processing and many other fields. A classic approach to sequence labeling is linear chain conditional random fields (CRFs). When combined with neural network encoders, they achieve very good performance in many sequence labeling tasks. One limitation of linear chain CRFs is their inability to model long-range dependencies between labels. High order CRFs extend linear chain CRFs by modeling dependencies no longer than their order, but the computational complexity grows exponentially in the order. In this paper, we propose the Neural Latent Dependency Model (NLDM) that models dependencies of arbitrary length between labels with a latent tree structure. We develop an end-to-end training algorithm and a polynomial-time inference algorithm of our model. We evaluate our model on both synthetic and real datasets and show that our model outperforms strong baselines.

Machine Learning

Hierarchical Latent Word Clustering

59 - Halid Ziya Yerebakan , Fitsum Reda , Yiqiang Zhan 2016

This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they share similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology reports collected from public repositories.

Computation and Language

Semantic Label Smoothing for Sequence to Sequence Problems

120 - Michal Lukasik , Himanshu Jain , Aditya Krishna Menon 2020

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approaches for seq2seq settings either do token level smoothing, or smooth over sequences generated by randomly substituting tokens in the target sequence. Unlike these works, in this paper, we propose a technique that smooths over emph{well formed} relevant sequences that not only have sufficient n-gram overlap with the target sequence, but are also emph{semantically similar}. Our method shows a consistent and significant improvement over the state-of-the-art techniques on different datasets.

Computation and Language Machine Learning