بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

textit{StateCensusLaws.org}: A Web Application for Consuming and Annotating Legal Discourse Learning

58 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Alexander Spangher

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Alexander Spangher - Jonathan May

الحساب واللغة المكتبات الرقمية تفاعل الإنسان والحاسوب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this work, we create a web application to highlight the output of NLP models trained to parse and label discourse segments in law text. Our system is built primarily with journalists and legal interpreters in mind, and we focus on state-level law that uses U.S. Census population numbers to allocate resources and organize government. Our system exposes a corpus we collect of 6,000 state-level laws that pertain to the U.S. census, using 25 scrapers we built to crawl state law websites, which we release. We also build a novel, flexible annotation framework that can handle span-tagging and relation tagging on an arbitrary input text document and be embedded simply into any webpage. This framework allows journalists and researchers to add to our annotation database by correcting and tagging new data.

قيم البحث

113 - Alexander Spangher , Jonathan May 2021

News article revision histories have the potential to give us novel insights across varied fields of linguistics and social sciences. In this work, we present, to our knowledge, the first publicly available dataset of news article revision histories, or textit{NewsEdits}. Our dataset is multilingual; it contains 1,278,804 articles with 4,609,43

الحساب واللغة المكتبات الرقمية

RTE: A Tool for Annotating Relation Triplets from Text

65 - Ankan Mullick , Animesh Bera , Tapas Nayak 2021

In this work, we present a Web-based annotation tool `Relation Triplets Extractor footnote{https://abera87.github.io/annotate/} (RTE) for annotating relation triplets from the text. Relation extraction is an important task for extracting structured i nformation about real-world entities from the unstructured text available on the Web. In relation extraction, we focus on binary relation that refers to relations between two entities. Recently, many supervised models are proposed to solve this task, but they mostly use noisy training data obtained using the distant supervision method. In many cases, evaluation of the models is also done based on a noisy test dataset. The lack of annotated clean dataset is a key challenge in this area of research. In this work, we built a web-based tool where researchers can annotate datasets for relation extraction on their own very easily. We use a server-less architecture for this tool, and the entire annotation operation is processed using client-side code. Thus it does not suffer from any network latency, and the privacy of the users data is also maintained. We hope that this tool will be beneficial for the researchers to advance the field of relation extraction.

الحساب واللغة

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

72 - Haejun Lee , Drew A. Hudson , Kangwook Lee 2020

We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner. Recent pre-training methods in NLP focus on learning either bottom or top-level language r epresentations: contextualized word representations derived from language model objectives at one extreme and a whole sequence representation learned by order classification of two given textual segments at the other. However, these models are not directly encouraged to capture representations of intermediate-size structures that exist in natural languages such as sentences and the relationships among them. To that end, we propose a new approach to encourage learning of a contextualized sentence-level representation by shuffling the sequence of input sentences and training a hierarchical transformer model to reconstruct the original ordering. Through experiments on downstream tasks such as GLUE, SQuAD, and DiscoEval, we show that this feature of our model improves the performance of the original BERT by large margins.

الحساب واللغة التعلم الآلي

Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems

92 - Shyam Upadhyay , Ming-Wei Chang 2016

We propose a new evaluation for automatic solvers for algebra word problems, which can identify mistakes that existing evaluations overlook. Our proposal is to evaluate such solvers using derivations, which reflect how an equation system was construc ted from the word problem. To accomplish this, we develop an algorithm for checking the equivalence between two derivations, and show how derivation an- notations can be semi-automatically added to existing datasets. To make our experiments more comprehensive, we include the derivation annotation for DRAW-1K, a new dataset containing 1000 general algebra word problems. In our experiments, we found that the annotated derivations enable a more accurate evaluation of automatic solvers than previously used metrics. We release derivation annotations for over 2300 algebra word problems for future evaluations.

الحساب واللغة

Cross-tier web programming for curated databases: A case study

133 - Simon Fowler , Simon D. Harding , Joanna Sharman 2020

Curated databases have become important sources of information across scientific disciplines, and due to the manual work of experts, often become important reference works. Features such as provenance tracking, archiving, and data citation are widely regarded as important features for curated databases, but implementing such features is challenging, and small database projects often lack the resources to do so. A scientific database application is not just the database itself, but also an ecosystem of web applications to display the data, and applications supporting data curation. Supporting advanced curation features requires changing all of these components, and there is currently no way to provide such capabilities in a reusable way. Cross-tier programming languages have been proposed to simplify the creation of web applications, where developers write an application in a single, uniform language. Consequently, database queries and updates can be written in the same language as the rest of the program, and at least in principle, it should be possible to provide curation features reusably via program transformations. As a first step, it is important to establish that realistic curated databases can be implemented in a cross-tier programming language. In this paper, we describe such a case study: reimplementing the web frontend of a real-world scientific database, the IUPHAR/BPS Guide to Pharmacology (GtoPdb), in the Links programming language. We show how features such as language-integrated query simplify the development process, and rule out common errors. We show that the Links implementation performs fewer database queries, while the time needed to handle the queries is comparable to the Java version. While there is some overhead to using Links because of its comparative immaturity compared to Java, the Links version is viable as a proof-of-concept case study.

لغات البرمجة المكتبات الرقمية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

معهد تكنولوجيا المعلومات ITI

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

textit{StateCensusLaws.org}: A Web Application for Consuming and Annotating Legal Discourse Learning

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً