ArGoT: A Glossary of Terms extracted from the arXiv


الملخص بالإنكليزية

We introduce ArGoT, a data set of mathematical terms extracted from the articles hosted on the arXiv website. A term is any mathematical concept defined in an article. Using labels in the articles source code and examples from other popular math websites, we mine all the terms in the arXiv data and compile a comprehensive vocabulary of mathematical terms. Each term can be then organized in a dependency graph by using the terms definitions and the arXivs metadata. Using both hyperbolic and standard word embeddings, we demonstrate how this structure is reflected in the texts vector representation and how they capture relations of entailment in mathematical concepts. This data set is part of an ongoing effort to align natural mathematical text with existing Interactive Theorem Prover Libraries (ITPs) of formally verified statements.

تحميل البحث