New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Named Entity Recognition in Historic Legal Text: A Transformer and State Machine Ensemble Method

الاسم المسمى للكيان في النص القانوني التاريخي: طريقة فرقة محول وآلة الدولة

381 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

عقود اللغة الإنجليزية historic legal text machine ensemble method النص القانوني التاريخي طريقة الفرقة آلة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Older legal texts are often scanned and digitized via Optical Character Recognition (OCR), which results in numerous errors. Although spelling and grammar checkers can correct much of the scanned text automatically, Named Entity Recognition (NER) is challenging, making correction of names difficult. To solve this, we developed an ensemble language model using a transformer neural network architecture combined with a finite state machine to extract names from English-language legal text. We use the US-based English language Harvard Caselaw Access Project for training and testing. Then, the extracted names are subjected to heuristic textual analysis to identify errors, make corrections, and quantify the extent of problems. With this system, we are able to extract most names, automatically correct numerous errors and identify potential mistakes that can later be reviewed for manual correction.

References used

https://aclanthology.org/

rate research

Named Entity Recognition in the Romanian Legal Domain

309 - Association for Computation Linguistics 2021 مقالة

Recognition of named entities present in text is an important step towards information extraction and natural language understanding. This work presents a named entity recognition system for the Romanian legal domain. The system makes use of the gold annotated LegalNERo corpus. Furthermore, the system combines multiple distributional representations of words, including word embeddings trained on a large legal domain corpus. All the resources, including the corpus, model and word embeddings are open sourced. Finally, the best system is available for direct usage in the RELATE platform.

romanian legal domain romanian legal المجال القانوني الروماني اللغة القانونية الرومانية صناعة حمض الفوسفور

HiTRANS: A Hierarchical Transformer Network for Nested Named Entity Recognition

316 - Association for Computation Linguistics 2021 مقالة

Nested Named Entity Recognition (NNER) has been extensively studied, aiming to identify all nested entities from potential spans (i.e., one or more continuous tokens). However, recent studies for NNER either focus on tedious tagging schemas or utiliz e complex structures, which fail to learn effective span representations from the input sentence with highly nested entities. Intuitively, explicit span representations will contribute to NNER due to the rich context information they contain. In this study, we propose a Hierarchical Transformer (HiTRANS) network for the NNER task, which decomposes the input sentence into multi-grained spans and enhances the representation learning in a hierarchical manner. Specifically, we first utilize a two-phase module to generate span representations by aggregating context information based on a bottom-up and top-down transformer network. Then a label prediction layer is designed to recognize nested entities hierarchically, which naturally explores semantic dependencies among different spans. Experiments on GENIA, ACE-2004, ACE-2005 and NNE datasets demonstrate that our proposed method achieves much better performance than the state-of-the-art approaches.

تصنيف متعددة مشروط nested named entity كيان اسمه متداخل صناعة حمض الفوسفور

Neural Text Classification and Stacked Heterogeneous Embeddings for Named Entity Recognition in SMM4H 2021

357 - Association for Computation Linguistics 2021 مقالة

This paper presents our findings from participating in the SMM4H Shared Task 2021. We addressed Named Entity Recognition (NER) and Text Classification. To address NER we explored BiLSTM-CRF with Stacked Heterogeneous embeddings and linguistic feature s. We investigated various machine learning algorithms (logistic regression, SVM and Neural Networks) to address text classification. Our proposed approaches can be generalized to different languages and we have shown its effectiveness for English and Spanish. Our text classification submissions have achieved competitive performance with F1-score of 0.46 and 0.90 on ADE Classification (Task 1a) and Profession Classification (Task 7a) respectively. In the case of NER, our submissions scored F1-score of 0.50 and 0.82 on ADE Span Detection (Task 1b) and Profession span detection (Task 7b) respectively.

وسائل الإعلام الاجتماعية ذات الصلة بالصحة stacked heterogeneous embeddings تدمير المتجانسة مكدسة صناعة حمض الفوسفور

RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of Named Entity Recognition Models

497 - Association for Computation Linguistics 2021 مقالة

To audit the robustness of named entity recognition (NER) models, we propose RockNER, a simple yet effective method to create natural adversarial examples. Specifically, at the entity level, we replace target entities with other entities of the same semantic class in Wikidata; at the context level, we use pre-trained language models (e.g., BERT) to generate word substitutions. Together, the two levels of at- tack produce natural adversarial examples that result in a shifted distribution from the training data on which our target models have been trained. We apply the proposed method to the OntoNotes dataset and create a new benchmark named OntoRock for evaluating the robustness of existing NER models via a systematic evaluation protocol. Our experiments and analysis reveal that even the best model has a significant performance drop, and these models seem to memorize in-domain entity patterns instead of reasoning from the context. Our work also studies the effects of a few simple data augmentation methods to improve the robustness of NER models.

الأساس المنطقي الاستخراجي create natural adversarial إنشاء الخصم الطبيعي صناعة حمض الفوسفور

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

587 - Association for Computation Linguistics 2021 مقالة

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grain ed structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points.

named entity disambiguation entity disambiguation غموض كيان اسمه غزول الكيان صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Named Entity Recognition in Historic Legal Text: A Transformer and State Machine Ensemble Method

الاسم المسمى للكيان في النص القانوني التاريخي: طريقة فرقة محول وآلة الدولة

Ask ChatGPT about the research

Read More

suggested questions